I've created a JSON file that contains the tweets that I've streamed. The file has multiple dictionaries i.e, one for each tweet. When I'm trying to read this file it
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3419)
This position is where the new record/tweet/dictionary starts. How can I fix this problem? I tried looking up similar answers but they weren't relevant to my problem. How can I read this file? Am I storing it in a wrong way?
This is the JSON file:
{"created_at": "Thu Jul 18 12:06:44 +0000 2019", "id": 1151825627051257856, "id_str": "1151825627051257856", "text": "@godhoonbey @cuttingedge2019 Unparalleled greed for power to loot on display in Karnataka in history of India. Did\u2026 ", "display_text_range": [29, 140], "source": "<a href=\"" rel=\"nofollow\">Twitter for Android</a>", "truncated": true, "in_reply_to_status_id": 1151797702419787778, "in_reply_to_status_id_str": "1151797702419787778", "in_reply_to_user_id": 840249609368797186,
.
.
.
.
"lang": "en", "timestamp_ms": "1563451604031"
}
{
# another tweet content
}
Your file isn't exactly a valid JSON because of this.
You need to wrap it with [
and ]
to make it one big list, and add commas after each document (to separate them).
If (and only if) each document is on a single line of its own (which I assume because the error is on line 2 column 1
), you can parse it line by line using json.loads
, like this:
import json
def parse_data(filename):
for l in open(filename, 'r'):
yield json.loads(l)
data = list(parse_data(filename))
BUT, you should really just make it a valid JSON by wrapping it in a big list as I first suggested.