multithreading python-3.x parallel-processing gil dill

Parallelizing python3 program with huge complex objects

Intro

I have a quite complex python program (say more than 5.000 rows) written with Python 3.6. This program parses a huge dataset of more than 5.000 files, processes them creating an internal representation of the dataset and then creates statistics. Since I have to test the model, I need to save the dataset representation and at now I'm doing it by using serialization through dill (in the representation there are objects that pickle does not support). The serialization of the whole dataset, not compressed, takes about 1GB.

The problem

Now, I would like to speed up computation by parallelization. The perfect way would be a multithreading approach but GIL forbid that. multiprocessing module (and multiprocess - which is dill compatible - too) uses serialization to share complex objects between processes so that, in the best case I managed to invent, parallelization is ininfluent for me on time performance because of the huge size of the dataset.

The question

What is the best way to manage this situation?

I know about posh, but it seems to be only x86 compatible, ray but it uses serialization too, gilectomy (a version of python without gil) but I'm not able to make it parallelize threads and Jython which has no GIL but is not compatible with python 3.x.

I am open to any alternative, any language, however complex it may be, but I can't rewrite the code from scratch.

Solution

Best solution I found is change dill to a custom pickling module based on standard pickle. See here: Python 3.6 pickling custom procedure