Search code examples
pythonnumpyscikit-learnpickle

Pickle and Numpy versions


I have some old sklearn models which I can't retrain. They were pickled long time ago with unclear versions. I can open them with Python 3.6 and Numpy 1.14. But when I try to move to Python 3.8 with Numpy 1.18, I get a segfault on loading them.

I tried dumping them with protocol 4 from Python 3.6, it didn't help.

Saving:

with open('model.pkl', 'wb') as fid:
    pickle.dump(model, fid, protocol=4)

Loading:

model = pickle.load(open('model.pkl', "rb"))

Is there anything I can do in such situation?


Solution

  • What worked for me (very task-specific but maybe will help someone):

    Old dependencies:

    import joblib
    model = pickle.load(open('model.pkl', "rb"), encoding="latin1")
    joblib.dump(model.tree_.get_arrays()[0], "training_data.pkl")
    

    Newer dependencies:

    import joblib
    from sklearn.neighbors import KernelDensity
    
    data = joblib.load("training_data.pkl")
    kde = KernelDensity(
          algorithm="auto",
          atol=0,
          bandwidth=0.5,
          breadth_first=True,
          kernel="gaussian",
          leaf_size=40,
          metric="euclidean",
          metric_params=None,
          rtol=0
    ).fit(data)
    
    with open("new_model.pkl", "wb") as f:
        pickle.dump(kde, f)