Search code examples
pythonjsonescapingdecode

Why can't Python decode this valid JSON with escaped quotes?


I have this almost JSON which has something that's only similar to JSON inside:

TEST_LINE = """Oct 21 22:39:28 GMT [TRACE] (Carlos-288) org.some.awesome.LoggerFramework RID=8e9076-4dd9-ec96-8f35-bde193498f: {
    "service": "MyService",
    "operation": "queryShowSize",
    "requestID": "8e9076-4dd9-ec96-8f35-bde193498f",
    "timestamp": 1634815968000,
    "parameters": [
        {
            "__type": "org.some.awsome.code.service#queryShowSizeRequest",
            "externalID": {
                "__type": "org.some.awsome.code.common#CustomerID",
                "value": "48317"
            },
            "CountryID": {
                "__type": "org.some.awsome.code.common#CountryID",
                "value": "125"
            },
            "operationOriginalDate": 1.63462085667E9,
            "operationType": "MeasureWithToes",
            "measureInstrumentIdentifier": "595909-48d2-6115-85e8-b3aa7b"
        }
    ],
    "output": {
        "__type": "org.some.awsome.code.common#queryShowSizeReply",
        "shoeSize": {
            "value": "$ion_1_0 '[email protected]'::'[email protected]'::{customer_id:\"983017317\",measureInstrumentIdentifierTilda:\"595909-48d2-6115-85e8-b3aa7b\",foot_owner:\"Oedipus\",toe_code:\"LR2X10\",account_number_token:\"1234-2838316-1298470\",token_status:VALID,country_code:GRC,measure_store_format:METRIC}"
        }
    }
}
"""

The regex gives me the start of the JSON and I try decoding from there. According to https://jsonlint.com/, it is valid JSON after that point.

So why doesn't Python's JSON module decode it? I get this error:

Exception has occurred: JSONDecodeError
Expecting ',' delimiter: line 25 column 156 (char 992)
  File "/Users/decoder/Downloads/json-problem.py", line 44, in read_json
    d = json.loads(line)
        ^^^^^^^^^^^^^^^^
  File "/Users/decoder/Downloads/json-problem.py", line 48, in <module>
    print(read_json(TEST_LINE))
          ^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 25 column 156 (char 992)

Line 25 and character 156 points to the first \" in output.shoeSize.value.

But why? That embedded value is only roughly JSON but it should not try to decode it anyway as it is given as a plain string. And the quotes are nicely escaped to not end the string early.

FIND_JSON = re.compile(
    r"\w{3} \d{2} (\d{2}[: ]){3}GMT \[[^]]+\] \([^)]+\) "
    r"org.some.awesome.LoggerFramework RID=[^:]+: "
)

def read_json(line: str) -> str | None:
    if not (m := FIND_JSON.match(line)):
        return None
    line = line[m.end(0) :]
    d = json.loads(line)
    return d


print(read_json(TEST_LINE))

I've also tried the raw_decode() but that fails similarly. I don't understand.

Update 1: To the commenter pointing to a non-escaped double quote, I don't see it. For me after the colon it follows a backslash and then a double quote. And—again—for me, the linter tells me it's good. Is there some copy & paste transformation happening on SO?

Update 2: Added the (still missing) code that makes the problem apparent.


Solution

  • Its not 100% clear how your defining the string but the issue is likely that the escaping of quotes is being processed by python and removed BEFORE its being feed into the json library:

    data ="""{
    "value": "{customer_id:\"983017317\"..."
    }'
    """
    print(data)  
    

    prints:

    {
    "value": "{customer_id:"983017317"..."
    }'
    

    see the escaping is gone. to have python not process the escaping and have it processed by json you need to declare it as a raw string with r"your_string" i.e

    data =r"""{
    "value": "{customer_id:\"983017317\"..."
    }"""
    print(data)
    

    prints:

    {
    "value": "{customer_id:\"983017317\"..."
    }
    

    which you can then feed into json.loads() without any issues.