Search code examples
pythonjsonpython-3.xtweepy

Reading a JSON file with multiple dictionary


I've created a JSON file that contains the tweets that I've streamed. The file has multiple dictionaries i.e, one for each tweet. When I'm trying to read this file it

json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3419)

This position is where the new record/tweet/dictionary starts. How can I fix this problem? I tried looking up similar answers but they weren't relevant to my problem. How can I read this file? Am I storing it in a wrong way?

This is the JSON file:

{"created_at": "Thu Jul 18 12:06:44 +0000 2019", "id": 1151825627051257856, "id_str": "1151825627051257856", "text": "@godhoonbey @cuttingedge2019 Unparalleled greed for power to loot on display in Karnataka in history of India. Did\u2026 ", "display_text_range": [29, 140], "source": "<a href=\"" rel=\"nofollow\">Twitter for Android</a>", "truncated": true, "in_reply_to_status_id": 1151797702419787778, "in_reply_to_status_id_str": "1151797702419787778", "in_reply_to_user_id": 840249609368797186,
.
.
.
.
"lang": "en", "timestamp_ms": "1563451604031"
}
{
    # another tweet content
}

Solution

  • Your file isn't exactly a valid JSON because of this.

    You need to wrap it with [ and ] to make it one big list, and add commas after each document (to separate them).

    If (and only if) each document is on a single line of its own (which I assume because the error is on line 2 column 1), you can parse it line by line using json.loads, like this:

    import json
    
    
    def parse_data(filename):
        for l in open(filename, 'r'):
            yield json.loads(l)
    
    
    data = list(parse_data(filename))
    

    BUT, you should really just make it a valid JSON by wrapping it in a big list as I first suggested.