Search code examples
pythonjsonpandastwitterkeyerror

KeyError occures while opening the JSON txt file and setting it up into a DataFrame


I had a code, which gave me an empty DataFrame with no saved tweets. I tried to debug it by putting print(line) under the for line in json file: and json_data = json.loads(line). That resulted a KeyError. How do I fix it? Thank you.

list_df = list()
# read the .txt file, line by line, and append the json data in each line to the list
with open('tweet_json.txt', 'r') as json_file:
    for line in json_file:
        print(line)
        json_data = json.loads(line)
        print(line)
        tweet_id = json_data['tweet_id']
        fvrt_count = json_data['favorite_count']
        rtwt_count = json_data['retweet_count']
        list_df.append({'tweet_id': tweet_id,
                        'favorite_count': fvrt_count,
                        'retweet_count': rtwt_count})

# create a pandas DataFrame using the list
df = pd.DataFrame(list_df, columns = ['tweet_id', 'favorite_count', 'retweet_count'])
df.head()

Solution

  • Your comment says you're trying to save to a file, but your code kind of says that you're trying to read from a file. Here are examples of how to do both:

    Writing to JSON

    import json
    import pandas as pd
    
    content = {  # This just dummy data, in the form of a dictionary
        "tweet1": {
            "id": 1,
            "msg": "Yay, first!"
        },
        "tweet2": {
            "id": 2,
            "msg": "I'm always second :("
        }
    }
    # Write it to a file called "tweet_json.txt" in JSON
    with open("tweet_json.txt", "w") as json_file:
        json.dump(content, json_file, indent=4)  # indent=4 is optional, it makes it easier to read
    

    Note the w (as in write) in open("tweet_json.txt", "w"). You're using r (as in read), which doesn't give you permission to write anything. Also note the use of json.dump() rather than json.load(). We then get a file that looks like this:

    $ cat tweet_json.txt
    {
        "tweet1": {
            "id": 1,
            "msg": "Yay, first!"
        },
        "tweet2": {
            "id": 2,
            "msg": "I'm always second :("
        }
    }
    

    Reading from JSON

    Let's read the file that we just wrote, using pandas read_json():

    import pandas as pd
    
    df = pd.read_json("tweet_json.txt")
    print(df)
    

    Output looks like this:

    >>> df
              tweet1                tweet2
    id             1                     2
    msg  Yay, first!  I'm always second :(