I have large json file with two lists of json objects.
example data:
[{"a":1}][{"b":2}]
import json
message = json.load(open("data.json"))
for m in message:
print m
As expected, I get ValueError.
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 10 - line 1 column 19 (char 9 - 18)
I thought of splitting the file by tracking the character count. What would be the pythonic way to handle this issue?
You could use json.JSONDecoder.raw_decode()
which will parse one complete object and return it with the character position it ended at, allowing you to iterate through each one:
from json import JSONDecoder, JSONDecodeError
decoder = JSONDecoder()
data = '[{"a":1}][{"b":2}]'
pos = 0
while True:
try:
o, pos = decoder.raw_decode(data, pos)
print(o)
except JSONDecodeError:
break
Result:
[{'a': 1}]
[{'b': 2}]