I have a JSON containing HTML, and I need to make it parsable. Pandas can't import this kind of JSON.
text = """[{
"article_id": 3540349,
"site_id": 1563,
"domain": "https:\/\/ear.rt.hm",
"code": "wta-jurmala-benara-u-ctrtl",
"uri": "https:\/\/ar.rl.hq\/spormala-berera-u-cetinalu\/",
"content_type": {
"id": 1,
"name": "article"
},
"article_type": {
"id": 1,
"name": "article"
},
"created": "2019-07-25 23:58:20",
"modified": "2019-07-25 23:59:19",
"publish_date": "2019-07-25 23:58:00",
"active": true,
"author": "<a href=\"https:\/\/spt02.com\" target=\"_blank\">I
Kapri<\/a>"
}]"""
text = text.replace('\"', "'")
The result is (nevermind the text difference):
'author': '<a href='https:\/\/spo.hq' target='_blank'>Iv<\/a>'
When I try to replace '\"' I then get:
"author": "<a href="https:\/\/spr.hq" target="_blank">Ilari<\/a>"
Which again wasn't what I wanted.
Does anyone know how to properly escape \" to ' ?
The problem is you escaped these \ characters when you shouldn't. Use the raw string by adding an r ahead of """
import json
text = r"""[{
"article_id": 35449,
"site_id": 153,
"domain": "https:\/\/ezt.hq",
"code": "wta-jurrda-pe-cetlu",
"uri": "https:\/\/ezl.hr\/s0349\/wla-balu\/",
"content_type": {
"id": 1,
"name": "article"
},
"article_type": {
"id": 1,
"name": "article"
},
"created": "2019-07-25 23:58:20",
"modified": "2019-07-25 23:59:19",
"publish_date": "2019-07-25 23:58:00",
"active": true,
"author": "<a href=\"https:\/\/spr2.hr\" target=\"_blank\">Iari<\/a>"
}]"""
obj = json.loads(text)
If you read text from a txt file, replace text = r"""..."""
with text = open(file_name).read()