I have a project in which joblib works well on one computer, it sends function to different cores effectively.
Now I have assignment to do same thing on a Databricks cluster. I've tried this many ways today, but the problem in the result is that the jobs do not spread out one-per-compute node. I've got 4 executors, I set n_jobs
=6, but when I send 4 jobs through, some of them pile up on same node, leaving nodes unused. Here's a picture of Databricks Spark UI:
. Sometimes when I try this, I get 1 job running on a node by itself and all of the rest are piled up on one node.
In the joblib
and joblibspark
docs, I see the parameter batch_size
which specifies how many tasks are sent to a given node. Even when I set that to 1, I get this same problem, nodes unused.
from joblib import Parallel, delayed
from joblibspark import register_spark
register_spark()
output = Parallel(backend="spark", n_jobs=6,
verbose=config.JOBLIB_VERBOSE, batch_size=1)(
delayed(fit_one)
(x, model_data=model_data, dlmodel=dlmodel,
outdir=outdir, frac=sample_p,
score_type=score_type,
save=save,
verbose=verbose) for x in ZZ)
I've hacked at this all day, trying various backends and combinations of settings. What am I missing?
Updates in spark and joblib spark solved this in mid 2022.