Search code examples
pythonpandaspickle

Pickled pandas DF exceeds maximum recursion depth


Unlike this other question, I am able to_pickle() my dataframe just fine. I've pulled a bunch of data from Twitter's API and data-framed it for analysis. I successfully pickled the dataframe, but now I am unable to read_pickle() it.

When I run the read_pickle() line, I get:

File ~/.cache/pypoetry/virtualenvs/twitter-gQnmvvjM-py3.11/lib/python3.11/site-packages/pandas/io/pickle.py:208, in read_pickle(filepath_or_buffer, compression, storage_options)
    205     with warnings.catch_warnings(record=True):
    206         # We want to silence any warnings about, e.g. moved modules.
    207         warnings.simplefilter("ignore", Warning)
--> 208         return pickle.load(handles.handle)
    209 except excs_to_catch:
    210     # e.g.
    211     #  "No module named 'pandas.core.sparse.series'"
    212     #  "Can't get attribute '__nat_unpickle' on <module 'pandas._libs.tslib"
    213     return pc.load(handles.handle, encoding=None)

File ~/.cache/pypoetry/virtualenvs/twitter-gQnmvvjM-py3.11/lib/python3.11/site-packages/tweepy/mixins.py:33, in DataMapping.__getattr__(self, name)
     31 def __getattr__(self, name):
     32     try:
---> 33         return self.data[name]
     34     except KeyError:
     35         raise AttributeError from None

File ~/.cache/pypoetry/virtualenvs/twitter-gQnmvvjM-py3.11/lib/python3.11/site-packages/tweepy/mixins.py:33, in DataMapping.__getattr__(self, name)
...
---> 33         return self.data[name]
     34     except KeyError:
     35         raise AttributeError from None

RecursionError: maximum recursion depth exceeded

When I tried to sys.setrecursionlimit(10**5) or higher, I get this error:

Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.

And the Jupyter log gives me this:

error 12:23:55.212: Raw kernel process exited code: undefined
error 12:23:55.214: Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
    at t.KernelShellFutureHandler.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:32419)
    at /home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:51471
    at Map.forEach (<anonymous>)
    at y._clearKernelState (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:51456)
    at y.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:44938)
    at /home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:17:96826
    at ee (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:1589492)
    at jh.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:17:96802)
    at Lh.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:17:104079)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
warn 12:23:55.215: Cell completed with errors {
  message: 'Canceled future for execute_request message before replies were done'
}

Solution

  • The issue turned out to be caused by tweepy and some of the tweet data stored in the DF. A pull request on Github has a patch for the Tweepy library to fix the recursion issue on unpickling.