Search code examples
pythonjsonpandastwitterjsonpickle

KeyError reading json_pickle(d) tweets into dataframe using read_pickle


Using python 2.7 and the jsonpickle and pandas library I saved a number of tweets to a .txt file using jsonfile.write(jsonpickle.encode(tweets._json,unpicklable=False)+'\n') which encodes the json value of the tweet using the jsonpickle package

when i try to read the txt file into a pandas dataframe in a different script using tester = pandas.read_pickle(fileToProcess)

My traceback's most recent call is

File "C:\Python27\lib\pickle.py", line 858, in load dispatchkey KeyError: '{'

I get the same error with a number of files I created. Here is an example file 3.8MB in size Sample Tweets File I'm new to json files but can an panda or a pickle expert help me get my tweets into a dataframe?


Solution

  • The read_pickle method is intended to deserialize data created with the pickle module, e.g. data serialized with to_picke method of Series, DataFrames or Panel. Like shown in this answer.

    If you are using jsonpickle.encode, you should be using the dual method provided by the library, jsonpickle.decode to deserialize your data.

    In general, I think you are better of using pandas' serialization mechanism.

    But if you want too badly to use jsonpickle:

    1. Note the following from documentation,

      If you will never need to load (regenerate the Python class from JSON), you can pass in the keyword unpicklable=False

    So, you shoudn't passk unpicklable=False to the encode methdod.

    1. You appear to be saving each object to a line in your file, so you should do.

    Something along this lines,

    tweets = []
    for line in fileToProcess:
        line = line[:-1]  # Drop '\n' char
        tweets.append(jsonpickle.decode(line))