Search code examples
pythonscikit-learnpicklejoblibdill

Exception has occurred: ModuleNotFoundError when unpickling objects (using dill or pickle or joblib)


I've fitted a sklearn Pipeline, and now that I need to deploy it, I've pickled my pipeline, (tried joblib and dill too). When I unpickle it in the other environment, which has the same version of dill, pickle and python, I have this error :

exception has occurred: ModuleNotFoundError
No module named '_regex'
  File "\opt\miniconda\lib\python3.6\site-packages\dill\_dill.py", line 832, in _import_module
  File "\opt\miniconda\lib\python3.6\site-packages\dill\_dill.py", line 305, in load
  File "C:\<edited>\score.py", line 40, in init
  File "C:\<edited>\score.py", line 90, in <module>
  File "\opt\miniconda\lib\python3.6\runpy.py", line 85, in _run_code
  File "\opt\miniconda\lib\python3.6\runpy.py", line 96, in _run_module_code
  File "\opt\miniconda\lib\python3.6\runpy.py", line 263, in run_path
  File "\opt\miniconda\lib\python3.6\runpy.py", line 85, in _run_code
  File "\opt\miniconda\lib\python3.6\runpy.py", line 193, in _run_module_as_main

This is unclear to me what causes this. I can unpickle the file on my local environment but somehow not on the target environment.

import dill as pickle
pickle._dill._reverse_typemap['ClassType'] = type

with open(prep_transformer_path, 'rb') as file:
    prep_transformer = pickle.loads(file)

This is basically the code that causes the error on the unpickling side of life. Any clue what I might be overlooking? I've had do add the typemap thing because of another issue I solved prior to this one.

And it's pickling a dozen of home grown Transformer fitted classes.

The pickling code is the following :

import dill as pickle

# Dump the prep pkl file
with open(os.path.join(output_models_directory, 'prep.pkl'), 'wb') as file:
    pickle.dump(trainingPrepPipe, file, protocol=pickle.HIGHEST_PROTOCOL)

thanks in advance for any help!


Solution

  • I encountered a similar issue when saving pipelines (using sklearn.joblib). It turns out that joblib doesn't store any code used by your pipeline. In my case, the problem was resolved by ensuring that, in production, all python modules used in the pipeline and/or classifier are available and in the same location relative to the pipeline creation module.

    For me, this meant copying and saving my_transformers.py along with the pipeline and classifier joblibs. Then when using/installing them in production, placing my_transformers.py in the same place relative to the module that created my pipeline.