Search code examples
pythonjsondata-analysistweets

Python Traceback Error When Reading Json File


I just extracted my all my tweeter history in a json file. so I want to do some data analysis on the tweets with python. I open the terminal and and entered the following commands to dump json from from python.

>>> import json
>>> with open('tweet.js') as json_file:
...     data = json.load(json_file)
...     print(data)

and got this "traceback" error

 Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\George\AppData\Local\Programs\Python\Python38-32\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
  File "C:\Users\George\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4771: character maps to <undefined>

the json file is name tweet.js and it follows this form

{
  "retweeted" : false,
  "source" : "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>",
  "entities" : {
    "hashtags" : [ ],
    "symbols" : [ ],
    "user_mentions" : [ {
      "name" : "Florin Pop \uD83D\uDC68\uD83C\uDFFB‍\uD83D\uDCBB",
      "screen_name" : "florinpop1705",
      "indices" : [ "0", "14" ],
      "id_str" : "861320851",
      "id" : "861320851"
    } ],
    "urls" : [ ]
  },
  "display_text_range" : [ "0", "155" ],
  "favorite_count" : "0",
  "in_reply_to_status_id_str" : "1194246195243302913",
  "id_str" : "1200417547524493312",
  "in_reply_to_user_id" : "861320851",
  "truncated" : false,
  "retweet_count" : "0",
  "id" : "1200417547524493312",
  "in_reply_to_status_id" : "1194246195243302913",
  "created_at" : "Fri Nov 29 14:13:40 +0000 2019",
  "favorited" : false,
  "full_text" : "@florinpop1705 I've heard good things about it, but never tried it.... Using kdenlive is simple yet some things are difficult to implement like text effect",
  "lang" : "en",
  "in_reply_to_screen_name" : "florinpop1705",
  "in_reply_to_user_id_str" : "861320851"
}

Solution

  • This solution will give you output,encoding="utf8" must be added.You specify the encoding when you open the file:

    import json
    with open("tweet.json", encoding="utf8") as json_file:
        data = json.load(json_file)
    print(data)