I'm trying to understand why the code below doesn't work in Python:
import json
s = json.loads(' {"Testing" : "This quo\\\\"te String"} ')
print(s)
Theoretically, what I should get back is {'Testing' : 'This quo\"te String'}
.
These ones work fine:
print(json.loads(' {"Testing" : "This quo\\"te String"} ')) ----> {'Testing' : 'This quo"te String'}
print(json.loads(' {"Testing" :"This quo\\\\\\"te String"}')) ----> {'Testing' : 'This quo\\"te String'}
I'm guessing it has something to do with the Idiosyncrasy of having a \"
in the dict, but can't figure out what exactly is happening.
The string This quo\"te String
requires two escapes in normal Python: one for the \
and one for the "
, making three backslashes in all:
>>> print("This quo\\\"te String")
This quo\"te String
For json, all those backslashes must be themselves escaped, because the string is embedded inside another string. Thus, six backslashes are required in total:
>>> print(json.loads('"This quo\\\\\\"te String"'))
This quo\"te String
However, if raw-strings are used, no extra escapes are required:
>>> print(json.loads(r'"This quo\\\"te String"'))
This quo\"te String
In your first example, the four backslashes will be parsed as a single literal \
(i.e. as an escaped backslash), leaving the "
unescaped.
Note that it makes no difference if the string is inside a dict
- the result will be exactly the same:
>>> dct = json.loads('{"Testing": "This quo\\\\\\"te String"}')
>>> print(dct['Testing'])
This quo\"te String