Sometimes I will download data from a json api, and it cuts off mid-way, usually due to network timeout or some other issues. However, in such scenarios I would like to be able to read the available data. Here is an example:
{
"response": 200,
"message": None,
"params": []
"body": {
"timestamp": 1546033192,
"_d": [
{"id": "FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz"},
{"id": "FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH"},
{"id": "Fmfgo9
I would like to be able to "complete the string" so that I'm able to parse the incomplete response as json. For example:
s = '''
{
"response": 200,
"message": null,
"params": [],
"body": {
"timestamp": 1546033192,
"_d": [
{"id": "FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz"},
{"id": "FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH"}
]
}
}'''
json.loads(s)
{'response': 200, 'message': None, 'params': [], 'body': {'timestamp': 1546033192, '_d': [{'id': 'FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz'}, {'id': 'FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH'}]}}
How would I be able to do the above with an arbitrarily constructed json object such as the above?
Here is the way I did it, building a stack of }
and ]
characters to try and 'finish off'. It's a bit verbose and can be cleaned up, but it works on a few string inputs I've tried:
s='''{
"response": 200,
"message": null,
"params": [],
"body": {
"timestamp": 1546033192,
"_d": [
{"id": "FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz"},
{"id": "FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH"},
{"id": "Fmfgo9'''
>>> f.complete_json_structure(s)
{'response': 200, 'message': None, 'params': [], 'body': {'timestamp': 1546033192, '_d': [{'id': 'FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz'}, {'id': 'FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH'}]}}
Here is the code:
# Build the 'unfinished character' stack
unfinished = []
for char in file_data:
if char in ['{', '[']:
unfinished.append(char)
elif char in ['}', ']']:
inverse_char = '{' if char == '}' else '['
# Remove the last one
unfinished.reverse()
unfinished.remove(inverse_char)
unfinished.reverse()
# Build the 'closing occurrence string'
unfinished.reverse()
unfinished = ['}' if (char == '{') else ']' for char in unfinished]
unfinished_str = ''.join(unfinished)
# Do a while loop to try and parse the json
data = None
while True:
if not json_string:
raise FileParserError("Could not parse the JSON file or infer its format.")
if json_string[-1] in ('}', ']'):
try:
data = json.loads(json_string + unfinished_str)
except json.decoder.JSONDecodeError:
# do it a second time as a sort of hack to fix the "trailing comma issue" (or could do a remove last comma, but that gets tricky)
try:
data = json.loads(json_string + unfinished_str[1:])
except json.decoder.JSONDecodeError:
pass
if data is not None:
break
if json_string[-1] == unfinished_str[0]:
unfinished_str = unfinished_str[1:]
json_string = json_string[:-1].strip().rstrip(',')
return data