Search code examples
pythonjsoncsvencodingjupyter

Jupyter unable to open json file, citing UnicodeDecodeError


Jupyter is throwing a UnicodeDecodeError when trying to open a JSON file in the directory. The error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[13], line 2
      1 with open('recalls.json') as file:
----> 2     json_data = json.load(file, encoding='utf-8')

File ~\AppData\Local\Programs\Python\Python312\Lib\json\__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
    275         parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
    276     """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    277     a JSON document) to a Python object.
    278 
   (...)
    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
    296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

File ~\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
     22 def decode(self, input, final=False):
---> 23     return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 172381: character maps to <undefined>

I'm trying to open a JSON file to eventually convert it into a CSV. The file is local and uploaded to Jupyter. I've set encoding and on Chrome it shows its encoding is utf-8, but I'm still thrown an error. Are there other encodings I should be trying?

Here is the code I'm using:

import pandas as pd
import json

with open('recalls.json') as file:
    json_data = json.load(file, encoding='utf-8')

Solution

  • I think you should indicate the encoding when open the file:

    with open('recalls.json', encoding='utf-8') as f:
        json_data = json.load(f)