Search code examples
pythonpandaspickle

Trouble opening old pickle file


I am trying to load an old pickle file containing the airline dataset ( https://arxiv.org/abs/1611.06740 ) . The pickle is very old and I have problems accessing it. If I try:

objects = []
with (open("airline.pickle", "rb")) as openfile:
    while True:
        try:
            objects.append(pickle.load(openfile))
        except EOFError:
            break

I get the following warning and error:

FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  objects.append(pickle.load(openfile))
Traceback (most recent call last):
  File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 9, in <module>
    objects.append(pickle.load(openfile))
TypeError: _reconstruct: First argument must be a sub-type of ndarray

Trying with pandas does not work:

  File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 203, in read_pickle
    return pickle.load(handles.handle)  # type: ignore[arg-type]
TypeError: _reconstruct: First argument must be a sub-type of ndarray

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 7, in <module>
    df = pd.read_pickle('airline.pickle')
  File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 208, in read_pickle
    return pc.load(handles.handle, encoding=None)
  File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\compat\pickle_compat.py", 
line 249, in load
    return up.load()
  File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1212, in load
    dispatch[key[0]](self)
  File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1725, in load_build
    for k, v in state.items():
AttributeError: 'tuple' object has no attribute 'items'

How can I access the file and save it to csv? I need the data that is contained there. I am using pandas 1.2.4 and python 3.6.


Solution

  • As mentioned in a previous answer, the error TypeError: _reconstruct: First argument must be a sub-type of ndarray is due to a change from pandas version 0.14 to 0.15 (Source). The documentation said that pd.read_pickle would be able to load such old pickle files, but this is not working on recent versions. If you install an older version, I tested 0.17.1 which can be obtained in pypi or conda-forge, it can load that pickle file successfully.

    If you are using conda, the following should work:

    conda create -n old_pandas -c conda-forge pandas=0.17.* python=3.*
    conda activate old_pandas
    

    And then, in a Python prompt,

    import pandas as pd
    dataset = pd.read_pickle("airline.pickle")