Search code examples
pythonjsonpandasurllib

Turn a geojson url to pandas (parsing)


I'm trying to turn a geojson file from URL to a dataframe (pandas). I've already read the file but, when I try to turn it into a dataframe, it's not as I expect.

!wget -q -O 'wuppertal.json' https://offenedaten-wuppertal.de/sites/default/files/Stadtbezirke_EPSG4326_JSON.json
print('Data downloaded!')


import urllib.request, json 
with urllib.request.urlopen("https://offenedaten-wuppertal.de/sites/default/files/Stadtbezirke_EPSG4326_JSON.json") as url:
wuppertal_data = json.loads(url.read().decode())
print(wuppertal_data)

out: "{'type': 'FeatureCollection', 'name': 'Stadtbezirke_EPSG4326_JSON', 'features': [{'type': 'Feature', 'properties': {'NAME': 'Langerfeld-Beyenburg', 'BEZIRK': '8', 'FLAECHE': 29391400}, 'geometry': {'type': 'Polygon', 'coordinates': [[[7.2510191991, 51.2917076298], [7.2505557773, 51.292270028], [7.2500517827, 51.2927158789], [7.2494997409, 51.2930461331], [7.2490203901, 51.2932752364], [7.2486303801, 51.2934148015], [7.2485802227, 51.2934327502], [7.2485234407, 51.2934518902], [7.2480306248, 51.2936180109], [7.2474431132, 51.293759494], [7.2471658102, 51.293788208], [7.2470561109, 51.2937995666], [7.24715411, 51.2937666386],...."


  neighborhoods_data = wuppertal_data['features']
 out: {'geometry': {'coordinates': [[[7.2510191991, 51.2917076298],
[7.2505557773, 51.292270028],
[7.2500517827, 51.2927158789],
[7.2494997409, 51.2930461331],
[7.2490203901, 51.2932752364],
[7.2486303801, 51.2934148015],


for data in neighborhoods_data:
neighborhood_name = data['properties']['NAME']
coordinates = data['geometry']['coordinates']
neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                      'Coordinates': coordinates}, ignore_index=True)


out : Neighborhood  Coordinates
0   Langerfeld-Beyenburg    [[[7.2510191991, 51.2917076298], [7.2505557773...
1   Uellendahl-Katernberg   [[[7.1677144694, 51.3126516481], [7.1674618797...
2   Cronenberg  [[[7.1173964686, 51.2337079198], [7.117197067,...

The problem is that in each row of my table I've a neigborhood with all the coordinates aggregate in one row.

I would like to have for each row: neighborhood / Latitude / Longitude

e.g: barmen/32,34/21,34
 barmen/..
 ...
So duplicate the neighborhood

If you can help me Thanks!


Solution

  • Might be a more efficient way, but this does the trick:

    import urllib.request, json 
    import pandas as pd
    
    
    with urllib.request.urlopen("https://offenedaten-wuppertal.de/sites/default/files/Stadtbezirke_EPSG4326_JSON.json") as url:
        wuppertal_data = json.loads(url.read().decode())
    
    neighborhoods_data = wuppertal_data['features']
    
    results = pd.DataFrame()
    for data in neighborhoods_data:
    
        neighborhood_name = data['properties']['NAME']
        temp_df = pd.DataFrame(data['geometry']['coordinates'])
        temp_df = temp_df.T
        temp_df = pd.DataFrame(temp_df.iloc[:,0].tolist(), columns=['Latitude', 'Longitude'])
    
        temp_df['Neighborhood'] = neighborhood_name
    
        results = results.append(temp_df).reset_index(drop=True)