Search code examples
pythonpandaspickleinstagram-api

Why does pandas attempt to import a module when reading from a pickled file?


I have collected some data via the Instagram API, which I've stored into a pandas DataFrame, which in turn has been saved via pandas .to_pickle() method.

When attempting to load the DataFrame on another computer using the `read_pickle()' method, the following error is returned:

Traceback (most recent call last):
File "examine.py", line 14, in <module>
dataframe = pd.read_pickle(args["dataframe"])
File "/home/user/virtualenvs/geopandas/local/lib/python2.7/site-packages/pandas/io/pickle.py", line 65, in read_pickle
return try_read(path)
File "/home/user/virtualenvs/geopandas/local/lib/python2.7/site-packages/pandas/io/pickle.py", line 62, in try_read
return pc.load(fh, encoding=encoding, compat=True)
File "/home/user/virtualenvs/geopandas/local/lib/python2.7/site-packages/pandas/compat/pickle_compat.py", line 117, in load
return up.load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
__import__(module)
ImportError: No module named instagram.models

Any idea what causes this?


Solution

  • Pickle simply doesn't know how to recreate the classes. The information how a class is unpickled and restored is stored inside the class: __new__, __init__, __setstate__ and more.

    Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. This is done on purpose, so you can fix bugs in a class or add methods to the class and still load objects that were created with an earlier version of the class. If you plan to have long-lived objects that will see many versions of a class, it may be worthwhile to put a version number in the objects so that suitable conversions can be made by the class’s __setstate__() method.

    Source: Python pickle: What can be pickled and unpickled?

    So to unpickle it, pickle needs to load the class (and thus any intermediate module).

    If you don't have/want the instagram-module you should check how to convert the appropriate values in your original dataframe to normal classes (int, float, array, ...) before you pickle it.