Search code examples
pythonscikit-learnpickle

ModuleNotFoundError: No module named 'sklearn.preprocessing._data'


My question is similar to this.

I also use pickle to save & load model. I meet the below error during pickle.load( )

from sklearn.preprocessing import StandardScaler
# SAVE
scaler = StandardScaler().fit(X_train)
X_trainScale = scaler.transform(X_train)
pickle.dump(scaler, open('scaler.scl','wb'))

# =================
# LOAD
sclr = pickle.load(open('scaler.scl','rb'))  # => ModuleNotFoundError: No module named 'sklearn.preprocessing._data'
X_testScale = sclr.transform(X_test)

ModuleNotFoundError: No module named 'sklearn.preprocessing._data'

It looks like a sklearn version issue. My sklearn version is 0.20.3, Python version is 3.7.3.

But I am using Python in an Anaconda .zip file. Is it possible to solve this without updating the version of sklearn?


Solution

  • I had exactly the same error message with StandardScaler using Anaconda.

    Fixed it by running:

    conda update --all
    

    I think the issue was caused by running the pickle dump for creating the scaler file on a machine with a newer version of scikit-learn, and then trying to run pickle load on machine with an older version of scikit-learn. (It gave the error when running pickle load on the machine with the older version of scikit-learn but no error when running pickle load on the machine with the newer version of scikit-learn. Both windows machines). Perhaps this is due to more recent versions using a different naming convention for functions regarding underscores (as mentioned above)?

    Anaconda would not let me update the scikit-learn library on it's own, because it claimed it required the older version (for some reason I could not understand). Perhaps another library was using it? So I had to fix it by updating all the libraries at the same time, which worked.