Search code examples
pythonpython-3.xjoblib

Joblib crashes after 2x n_jobs


Joblib is crashing with the error

  Parallel(n_jobs=-1, prefer="threads", verbose=10)(
  File "/home/developer/.local/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in __call__
    self.retrieve()
  File "/home/developer/.local/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/developer/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/home/developer/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/home/developer/.local/lib/python3.8/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
TypeError: cannot unpack non-iterable function object

On this snippet of code (some names were changed to hide information)

    with open(inputFile) as file:
        csv_reader = csv.DictReader(
            file, fieldnames=["Header1", "Header2"])
        Parallel(n_jobs=3, prefer="threads", verbose=10)(
            delayed(pullSummaryData(row["Header1"]))
            for row in csv_reader
        )

The interesting part is it always crashes after calling pullSummaryData exactly 2*n_jobs. If n_jobs=3, pullSummaryData will be called 6 times before crashing.

Joblib v1.0.1

csv v1.0

Python v3.8.5


Solution

  • Try changing delayed(pullSummaryData(row["Header1"])) to delayed(pullSummaryData)(row["Header1"]).

    Ref: Document

    Answer based on user696969's comment under original post.