java - Parallel training for Apache MLlib Random Forest -

i have java application trains mllib random forest (org.apache.spark.mllib.tree.randomforest) on training-set 200k samples. i've noticed 1 cpu core utilised during training. given random forest ensemble of n decision trees, 1 think trees trained in parallel, , utilising available cores. there configuration option or api call or else can enable parallel training of decision trees?

i found answer this. issue how set spark configuration using sparkconf.setmaster("local"). changed sparkconf.setmaster("local[16]") use 16 threads, per javadoc:

http://spark.apache.org/docs/latest/api/java/org/apache/spark/sparkconf.html#setmaster(java.lang.string)

now training running far quicker, , amazon datacentre in virginia hotter :)

a typical case of rtfm, in defence use of setmaster() seems bit hacky me. better design add separate method setting number of local threads/cores use.

Search This Blog

WIKI

java - Parallel training for Apache MLlib Random Forest -

Comments

Post a Comment

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - InvalidDataAccessApiUsageException: Parameter value element did not match expected type -