With the following python code:
filePath = urllib2.urlopen('xx.json')
fileJSON = json.loads(filePath.read().decode('utf-8'))
Where the xx.json looks like:
{
"tags": [{
"id": "123",
"name": "Airport",
"name_en": "Airport",
"name_cn": "机场",
"display": false
}]
}
I see the following exception:
fileJSON = json.loads(filePath.read().decode('utf-8'))
File "/usr/lib64/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The code works before the Chinese characters are added to the json file, when I also added the .decode('utf-8') behind the read() as well.
I am not sure what needs to be done?
$ wget https://s3.amazonaws.com/wherego-sims/tags.json
$ file tags.json
tags.json: UTF-8 Unicode (with BOM) text, with CRLF line terminators
This file begins with a byte order mark (EF BB BF), which is illegal in JSON (JSON Specification and usage of BOM/charset-encoding). You must first decode this using 'utf-8-sig'
in Python to get a valid JSON unicode string.
json.loads(filePath.read().decode('utf-8-sig'))
For what it's worth, Python 3 (which you should be using) will give a specific error in this case and guide you in handling this malformed file:
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Namely, by specifying that you wish to discard the BOM if it exists (again, it's not conventional to use this in UTF-8, particularly with JSON which is always encoded in UTF-8 so it is worse than useless):
>>> import json
>>> json.load(open('tags.json', encoding='utf-8-sig'))