I want to get topography for French departments with this code :
import pandas as pd
import requests
link_dep = 'https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json'
topo = requests.get(link_dep).json()
x = topo['objects']['default']['geometries']
xx = [y for y in x if y['type'] in ['Polygon', 'MultiPolygon']]
df = pd.json_normalize(xx)
but, for df.loc[26, 'properties.name']
, I get 'Deux-Sčvres' instead of 'Deux-Sèvres'. This issue does not appear for 'ô' or 'é'
I understand it's an encoding issue but I do not know how I can slightly modify my code to get correct encoding at the first step ?
The JSON response contains the Unicode character \u010d which prints as č. For è it would have to be \u00e8. This is not an encoding issue per se. The data's just wrong.
You can replace \u010d with \u00e8 in the "name" value as follows:
import requests
T = str.maketrans({"\u010d": "\u00e8"})
URL = "https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json"
with requests.get(URL) as response:
response.raise_for_status()
data = response.json()
for g in data['objects']['default']['geometries']:
p = g["properties"]
if (name := p.get("name")) is not None:
p["name"] = name.translate(T)
print(p["name"])