Search code examples
pythonhtmljsonunicodeascii

Replace ASCII HTML characters when loading JSON


I'm loading a JSON file made up of yelp restaurant reviews so that it removes Unicode characters this way:

def parse_yelp_restaurant_api(self, response):

        jsonresponse = json.loads(response.text, strict=False)

Now I would like to also remove ASCII HTML characters. My JSON file is full of '&#39', '&#34', etc.


Solution

  • I solved the problem by using html.unescape on the retrieved fields as suggested by Panagiotis Kanavos.

    response.json() (as suggested by puchal) also made things easier for unicode guessing.