I got an unexpected quote in my json string that make json.loads(jstr) fails.
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
So I'd like to use the regular expression to match and delete the quote inside the value of "content". I tried something in other solution:
import re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
pa = re.compile(r'(:\s+"[^"]*)"(?=[^"]*",)')
pa.findall(json_str)
[out]: []
Is there any way to fix the string?
As noted by @jonrsharpe, you'd be far better off cleaning the source.
That said, if you do not have control over where the extra quote is coming from, you could use (*SKIP)(*FAIL)
using the newer regex
module and neg. lookarounds like so:
"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])
Python
:
import json, regex as re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
# clean the json
rx = re.compile('''"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])''')
json_str = rx.sub('', json_str)
# load it
json = json.loads(json_str)
print(json['id'])
# 9