Search code examples
pythonjsonhighcharts

Encoding issue with requests getting json


I want to get topography for French departments with this code :

import pandas as pd
import requests

link_dep = 'https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json'
topo = requests.get(link_dep).json()
x = topo['objects']['default']['geometries']
xx = [y for y in x if y['type'] in ['Polygon', 'MultiPolygon']]
df = pd.json_normalize(xx)

but, for df.loc[26, 'properties.name'], I get 'Deux-Sčvres' instead of 'Deux-Sèvres'. This issue does not appear for 'ô' or 'é'

I understand it's an encoding issue but I do not know how I can slightly modify my code to get correct encoding at the first step ?


Solution

  • The JSON response contains the Unicode character \u010d which prints as č. For è it would have to be \u00e8. This is not an encoding issue per se. The data's just wrong.

    You can replace \u010d with \u00e8 in the "name" value as follows:

    import requests
    
    T = str.maketrans({"\u010d": "\u00e8"})
    
    URL = "https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json"
    
    with requests.get(URL) as response:
        response.raise_for_status()
        data = response.json()
        for g in data['objects']['default']['geometries']:
            p = g["properties"]
            if (name := p.get("name")) is not None:
                p["name"] = name.translate(T)
                print(p["name"])