I'm using parallel function from joblib to parallelize a task. All processes take as input a pandas dataframe. In order to reduce the run-time memory used it is possible to sharing this dataframe? All processes read-only on it. I found a similar solution but for a numpy array and using multiprocessing here: Shared-memory objects in multiprocessing
this is the snippet of the code:
from joblib import Parallel, delayed
def fun(df, cat):
a = df[ df[ y ] != cat ]
b = df[ df[ y ] == cat ]
...
output = Parallel(n_jobs=-1)(delayed(func())(df, cat) for cat in labels )
df is a pandas dataframe and labels is just a list.
I solved passing directly the filter dataframes
output = Parallel(n_jobs=-1)(delayed(func)(df[ df[ target ] == cat ],
df[ df[ target ] != cat ],
cat) for cat in labels )