Search code examples
pythonpython-3.xpandasparallel-processingjoblib

Joblib persistence and Pandas


There is good documentation on persisting Numpy arrays in Joblib using a memory-mapped file.

In recent versions, Joblib will (apparently) automatically persist and share Numpy arrays in this fashion.

Will Pandas data frames also be persisted, or would the user need to implement persistence manually?


Solution

  • Since Pandas data frames are built on Numpy arrays, yes, they will be persisted.

    Joblib implements its optimized persistence by hooking in to the pickle protocol. Anything that includes numpy arrays in its pickled representation will benefit from Joblib's optimizations.