Search code examples
pythonpandasgeojsonnested-lists

How can I loop through a nested list to store the values in a data frame?


Given a nested dictionary neighborhood_data and that the first item i.e neighborhood_data[0] displays

{'type': 'Feature',
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[28.073783, -26.343133],
     [28.071239, -26.351536],
     [28.068717, -26.350644],
     [28.06663, -26.351362],
     [28.065161, -26.352135],
     [28.064671, -26.35399]]]],
'properties': {'cartodb_id': 1,
  'subplace_c': 761001001,
  'province': 'Gauteng',
  'wardid': '74202012',
  'district_m': 'Sedibeng',
  'local_muni': 'Midvaal',
  'main_place': 'Alberton',
  'mp_class': 'Settlement',
  'sp_name': 'Brenkondown',
  'suburb_nam': 'Brenkondown',
  'metro': 'Johannesburg',
  'african': 330,
  'white': 24,
  'asian': 0,
  'coloured': 2,
  'other': 0,
  'totalpop': 356}}}

I then created an empty data frame neighborhoods

# define the dataframe columns
column_names = ['Province', 'District', 'Local_municipality','Main Place', 'Suburb','Metro','Latitude','Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

However when I looped through neighborhoods_data to store the relevant data in the neighborhoods data frame I got the following error

for data in neighborhood_data:
    province = data['properties']['province']
    district = data['properties']['district_m']
    local_muni_name = suburb_name = data['properties']['local_muni'] 
    suburb_name = data['properties']['suburb_nam']
    metro = data['properties']['metro']
    
    suburb_latlon = data['geometry']['coordinates']
    subur_lat = suburb_latlon[[[[1]]]]
    suburb_lon = suburb_latlon[[[[0]]]]
    
    neighborhoods = neighborhoods.append({'Province': province,
                                          'District': district,
                                          'Local_municipality': local_muni_name,
                                          'Main place': main_place,
                                          'Suburb': suburb_name,
                                          'Metro': metro,
                                          'Latitude': suburb_lat,
                                          'Longitude': suburb_lon}, ignore_index=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-a5dc74ed4207> in <module>
      7 
      8     suburb_latlon = data['geometry']['coordinates']
----> 9     subur_lat = suburb_latlon[[[[1]]]]
     10     suburb_lon = suburb_latlon[[[[0]]]]
     11 

TypeError: list indices must be integers or slices, not list

So how can I store the latitude and longitude coordinates in the columns 'Latitude' and 'Longitude' for the empty data frame?


Solution

  • Your dictionary is malformed, it misses closing square brackets in the coordinates key, but let's assume that this is the correct dictionary:

    {'geometry': {'coordinates': [[[[28.073783, -26.343133],
         [28.071239, -26.351536],
         [28.068717, -26.350644],
         [28.06663, -26.351362],
         [28.065161, -26.352135],
         [28.064671, -26.35399]]]],
      'properties': {'african': 330,
       'asian': 0,
       'cartodb_id': 1,
       'coloured': 2,
       'district_m': 'Sedibeng',
       'local_muni': 'Midvaal',
       'main_place': 'Alberton',
       'metro': 'Johannesburg',
       'mp_class': 'Settlement',
       'other': 0,
       'province': 'Gauteng',
       'sp_name': 'Brenkondown',
       'subplace_c': 761001001,
       'suburb_nam': 'Brenkondown',
       'totalpop': 356,
       'wardid': '74202012',
       'white': 24},
      'type': 'MultiPolygon'},
     'type': 'Feature'}
    

    Then, to access

    suburb_latlon = data['geometry']['coordinates']
    subur_lat = suburb_latlon[[[[1]]]] # <--- Indexing error here
    suburb_lon = suburb_latlon[[[[0]]]] # <--- Indexing error here
    

    We want to do the following (unpack through the extra list levels until we have our coordinate):

    suburb_latlon = data['geometry']['coordinates']
    subur_lat = suburb_latlon[0][0][0][1] # <--- Not sure what your logic is here, and why you would pick the first one, but I'll assume that given this indexing procedure you can customize this.
    suburb_lon = suburb_latlon[0][0][0][0] # <--- Same here