Search code examples
pythonscikit-learncross-validationgrid-search

Sklearn gridsearchCV object changed after pickle dump/load


I have a gridsearchCV object I created with

grid_search = GridSearchCV(pred_home_pipeline, param_grid)

I would like to save the entire grid-search object so I can explore the model-tuning results later. I do not want to just save the best_estimator_. But after dumping and reloading, the reloaded and original grid_search objects are different in some way which I cannot track down.

# save to disk
with open(filepath, 'wb') as handle:
    pickle.dump(grid_search, handle, protocol=pickle.HIGHEST_PROTOCOL)

# reload
with open(filepath, 'rb') as handle:
    grid_reloaded = pickle.load(handle)

# test object is unchanged after dump/reload
print(grid_search == grid_reloaded)    

False

Weird. Looking at the outputs of print(grid_search) and print(grid_reloaded) they certainly look the same.

And they create the exact same set of 525 predicted values for data I held out entirely from the grid-search process:

grid_search_preds  = grid_search.predict(X_test)
grid_reloaded_preds= grid_reloaded.predict(X_test)

(grid_search_preds == grid_reloaded_preds).all()

True

...Even though the best_estimator_ attributes are not technically the same:

grid_search.best_estimator_ == grid_reloaded.best_estimator_

False

...although the best_estimate_ attributes also certainly look the same comparing print(grid_search.best_estimatmator_) and print(grid_reloaded.best_estimator_)

What's going on here? Is it safe to save the gridsearchcv object for inspection later?


Solution

  • That's because the comparison is returning whether or not the objects are the same object.

    To see why, follow the object hierarchy, you'll see there's no __eq__ function overridden (or __cmp__):

    Thus the "==" comparison falls back to a object memory location comparison for which of course your reloaded instance and your current instance cannot be equal. This is comparing to see if they are the same object.

    See more here.