java - Parallel training for Apache MLlib Random Forest -
i have java application trains mllib random forest (org.apache.spark.mllib.tree.randomforest) on training-set 200k samples. i've noticed 1 cpu core utilised during training. given random forest ensemble of n decision trees, 1 think trees trained in parallel, , utilising available cores. there configuration option or api call or else can enable parallel training of decision trees?
i found answer this. issue how set spark configuration using sparkconf.setmaster("local"). changed sparkconf.setmaster("local[16]") use 16 threads, per javadoc:
now training running far quicker, , amazon datacentre in virginia hotter :)
a typical case of rtfm, in defence use of setmaster() seems bit hacky me. better design add separate method setting number of local threads/cores use.
Comments
Post a Comment