Unlike this other question, I am able to_pickle()
my dataframe just fine. I've pulled a bunch of data from Twitter's API and data-framed it for analysis. I successfully pickled the dataframe, but now I am unable to read_pickle()
it.
When I run the read_pickle()
line, I get:
File ~/.cache/pypoetry/virtualenvs/twitter-gQnmvvjM-py3.11/lib/python3.11/site-packages/pandas/io/pickle.py:208, in read_pickle(filepath_or_buffer, compression, storage_options)
205 with warnings.catch_warnings(record=True):
206 # We want to silence any warnings about, e.g. moved modules.
207 warnings.simplefilter("ignore", Warning)
--> 208 return pickle.load(handles.handle)
209 except excs_to_catch:
210 # e.g.
211 # "No module named 'pandas.core.sparse.series'"
212 # "Can't get attribute '__nat_unpickle' on <module 'pandas._libs.tslib"
213 return pc.load(handles.handle, encoding=None)
File ~/.cache/pypoetry/virtualenvs/twitter-gQnmvvjM-py3.11/lib/python3.11/site-packages/tweepy/mixins.py:33, in DataMapping.__getattr__(self, name)
31 def __getattr__(self, name):
32 try:
---> 33 return self.data[name]
34 except KeyError:
35 raise AttributeError from None
File ~/.cache/pypoetry/virtualenvs/twitter-gQnmvvjM-py3.11/lib/python3.11/site-packages/tweepy/mixins.py:33, in DataMapping.__getattr__(self, name)
...
---> 33 return self.data[name]
34 except KeyError:
35 raise AttributeError from None
RecursionError: maximum recursion depth exceeded
When I tried to sys.setrecursionlimit(10**5)
or higher, I get this error:
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.
And the Jupyter log gives me this:
error 12:23:55.212: Raw kernel process exited code: undefined
error 12:23:55.214: Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:32419)
at /home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:51471
at Map.forEach (<anonymous>)
at y._clearKernelState (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:51456)
at y.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:44938)
at /home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:17:96826
at ee (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:2:1589492)
at jh.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:17:96802)
at Lh.dispose (/home/ryan/.vscode-server/extensions/ms-toolsai.jupyter-2023.3.1000892223/out/extension.node.js:17:104079)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
warn 12:23:55.215: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
The issue turned out to be caused by tweepy
and some of the tweet data stored in the DF. A pull request on Github has a patch for the Tweepy library to fix the recursion issue on unpickling.