Search code examples
pythonjsonregexparsingdouble-quotes

Python: Replace an escaped quote with another character


I have a JSON containing HTML, and I need to make it parsable. Pandas can't import this kind of JSON.

text = """[{
   "article_id": 3540349,
   "site_id": 1563,
   "domain": "https:\/\/ear.rt.hm",
   "code": "wta-jurmala-benara-u-ctrtl",
   "uri": "https:\/\/ar.rl.hq\/spormala-berera-u-cetinalu\/",
   "content_type": {
       "id": 1,
       "name": "article"
   },
   "article_type": {
       "id": 1,
       "name": "article"
   },
   "created": "2019-07-25 23:58:20",
   "modified": "2019-07-25 23:59:19",
   "publish_date": "2019-07-25 23:58:00",
   "active": true,
   "author": "<a href=\"https:\/\/spt02.com\" target=\"_blank\">I 
Kapri<\/a>"
}]"""

text = text.replace('\"', "'")

The result is (nevermind the text difference):

'author': '<a href='https:\/\/spo.hq' target='_blank'>Iv<\/a>'

When I try to replace '\"' I then get:

"author": "<a href="https:\/\/spr.hq" target="_blank">Ilari<\/a>"

Which again wasn't what I wanted.

Does anyone know how to properly escape \" to ' ?


Solution

  • The problem is you escaped these \ characters when you shouldn't. Use the raw string by adding an r ahead of """

    import json
    text = r"""[{
       "article_id": 35449,
       "site_id": 153,
       "domain": "https:\/\/ezt.hq",
       "code": "wta-jurrda-pe-cetlu",
       "uri": "https:\/\/ezl.hr\/s0349\/wla-balu\/",
       "content_type": {
           "id": 1,
           "name": "article"
       },
       "article_type": {
           "id": 1,
           "name": "article"
       },
       "created": "2019-07-25 23:58:20",
       "modified": "2019-07-25 23:59:19",
       "publish_date": "2019-07-25 23:58:00",
       "active": true,
       "author": "<a href=\"https:\/\/spr2.hr\" target=\"_blank\">Iari<\/a>"
    }]"""
    obj = json.loads(text)
    

    If you read text from a txt file, replace text = r"""...""" with text = open(file_name).read()