Search code examples
pythonjsonpython-3.xapihttp

Complete a json string from incomplete HTTP JSON response


Sometimes I will download data from a json api, and it cuts off mid-way, usually due to network timeout or some other issues. However, in such scenarios I would like to be able to read the available data. Here is an example:

{
    "response": 200,
    "message": None,
    "params": []
    "body": {
        "timestamp": 1546033192,
        "_d": [
                {"id": "FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz"},
                {"id": "FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH"},
                {"id": "Fmfgo9

I would like to be able to "complete the string" so that I'm able to parse the incomplete response as json. For example:

s = '''
{
    "response": 200,
    "message": null,
    "params": [],
    "body": {
        "timestamp": 1546033192,
        "_d": [
                {"id": "FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz"},
                {"id": "FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH"}
              ]
    }
}'''
json.loads(s)
{'response': 200, 'message': None, 'params': [], 'body': {'timestamp': 1546033192, '_d': [{'id': 'FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz'}, {'id': 'FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH'}]}}

How would I be able to do the above with an arbitrarily constructed json object such as the above?


Solution

  • Here is the way I did it, building a stack of } and ] characters to try and 'finish off'. It's a bit verbose and can be cleaned up, but it works on a few string inputs I've tried:

    s='''{
    "response": 200,
    "message": null,
    "params": [],
    "body": {
        "timestamp": 1546033192,
        "_d": [
                {"id": "FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz"},
                {"id": "FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH"},
                {"id": "Fmfgo9'''
    
    >>> f.complete_json_structure(s)
    {'response': 200, 'message': None, 'params': [], 'body': {'timestamp': 1546033192, '_d': [{'id': 'FMfcgxwBTsWRDsWDqgqRtZlLMdpCpTDz'}, {'id': 'FMfcgxwBTkFSKqRrcKzMFvLCjDSSbrJH'}]}}
    

    Here is the code:

    # Build the 'unfinished character' stack
    unfinished = []
    for char in file_data:
        if char in ['{', '[']:
            unfinished.append(char)
        elif char in ['}', ']']:
            inverse_char = '{' if char == '}' else '['
            # Remove the last one
            unfinished.reverse()
            unfinished.remove(inverse_char)
            unfinished.reverse()
    
    # Build the 'closing occurrence string' 
    unfinished.reverse()
    unfinished = ['}' if (char == '{') else ']' for char in unfinished]
    unfinished_str = ''.join(unfinished)
    
    # Do a while loop to try and parse the json
    data = None
    while True:
    
        if not json_string:
            raise FileParserError("Could not parse the JSON file or infer its format.")
    
        if json_string[-1] in ('}', ']'):
    
            try:
                data = json.loads(json_string + unfinished_str)
            except json.decoder.JSONDecodeError:
                # do it a second time as a sort of hack to fix the "trailing comma issue" (or could do a remove last comma, but that gets tricky)
                try:
                    data = json.loads(json_string + unfinished_str[1:])
                except json.decoder.JSONDecodeError:
                    pass
    
            if data is not None:
                break
    
        if json_string[-1] == unfinished_str[0]:
            unfinished_str = unfinished_str[1:]
    
        json_string = json_string[:-1].strip().rstrip(',')
    
    return data