Search code examples
pythondictionaryfilternestedgeojson

How do I filter a GeoJson file for specific countries?


Challenge: I'm trying to create a new dictionary from my geojson dictionary that is filtered for only the countries of interest because the raw geojson file is too large for visualization.

I have a geojson file with the below form, which I've created an empty dictionary to replicate:

newData = {'features': {},
           'properties':{'ADMIN':"",
                         'ISO_A3':"",
                         },
           'geometry':{'type':"",
                       'coordinates':""
                       },
           'id':""
           }

Below is an example of one of the elements from the geojson file:

data['features'][3]

{'type': 'Feature',
 'properties': {'ADMIN': 'Aruba', 'ISO_A3': 'ABW'},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-69.99693762899992, 12.577582098000036],
    [-69.93639075399994, 12.53172435100005],
    [-69.92467200399994, 12.519232489000046],
    [-69.91576087099992, 12.497015692000076],
    [-69.88019771999984, 12.453558661000045],
    [-69.87682044199994, 12.427394924000097],
    [-69.88809160099993, 12.417669989000046],
    [-69.90880286399994, 12.417792059000107],
    [-69.93053137899989, 12.425970770000035],
    [-69.94513912699992, 12.44037506700009],
    [-69.92467200399994, 12.44037506700009],
    [-69.92467200399994, 12.447211005000014],
    [-69.95856686099992, 12.463202216000099],
    [-70.02765865799992, 12.522935289000088],
    [-70.04808508999989, 12.53115469000008],
    [-70.05809485599988, 12.537176825000088],
    [-70.06240800699987, 12.546820380000057],
    [-70.06037350199995, 12.556952216000113],
    [-70.0510961579999, 12.574042059000064],
    [-70.04873613199993, 12.583726304000024],
    [-70.05264238199993, 12.600002346000053],
    [-70.05964107999992, 12.614243882000054],
    [-70.06110592399997, 12.625392971000068],
    [-70.04873613199993, 12.632147528000104],
    [-70.00715084499987, 12.5855166690001],
    [-69.99693762899992, 12.577582098000036]]]},
 'id': 'ABW'}

I also have a data frame object of the countries that I'm actually interested in analyzing:

df_Country.head()
2                   Italy
3                   Spain
4                Portugal
5    United Arab Emirates
6                   Egypt

This file has a number of countries that are unnecessary for the analysis I'm performing so I'd like to filter them out. I believe that this is similar to filtering a nested dictionary. To do this I've tried to create an empty dictionary and loop through it adding in the values of the geo_data whenever I have a match to df_Countries. Below is what I've attempted:

for i in range(len(data['features'])):
  if data['features'][i]['properties']['ADMIN'] in df_Country:
    newData['properties']['ADMIN'] = data['features'][i]['properties']['ADMIN']
    newData['properties']['ISO_A3'] = data['features'][i]['properties']['ISO_A3']
    newData['geometry']['type'] = data['features'][i]['geometry']['type']
    newData['geometry']['coordinates'] = data['features'][i]['geometry']['coordinates']
    newData['id'] = data['features'][i]['id']

At the end of this, my newData dictionary is still empty. Any thoughts? Thank you in advance!


Solution

  • You were really close! You can do a one-liner list comprehension like this:

    # example data
    geo_json = [
        {'type': 'Feature',
         'properties': {'ADMIN': 'Italy', 'ISO_A3': 'ABW'},
         'geometry': {'type': 'Polygon',
                      'coordinates': [[[-69.99693762899992, 12.577582098000036],
                                       [-69.99693762899992, 12.577582098000036]]]},
            'id': 'ABW'},
        {'type': 'Feature',
         'properties': {'ADMIN': 'Aruba', 'ISO_A3': 'ABW'},
         'geometry': {'type': 'Polygon',
                      'coordinates': [[[-69.99693762899992, 12.577582098000036],
                                       [-69.99693762899992, 12.577582098000036]]]},
            'id': 'ABW'},
        {'type': 'Feature',
         'properties': {'ADMIN': 'Spain', 'ISO_A3': 'ABW'},
         'geometry': {'type': 'Polygon',
                      'coordinates': [[[-69.99693762899992, 12.577582098000036],
                                       [-69.99693762899992, 12.577582098000036]]]},
            'id': 'ABW'},
    ]
    
    # countries you want
    countries = ['Italy', 'Spain']
    
    # new list of geo_json but only ones with ['properties']['ADMIN'] in countries
    filtered = [geo for geo in geo_json if geo['properties']['ADMIN'] in countries]
    
    # pretty print the results
    from pprint import pprint
    pprint(filtered)
    

    The comparable for loop to that comprehension would look like:

    filtered = []
    for geo in geo_json:
        if geo['properties']['ADMIN'] in countries:
            filtered.append(geo)
    

    Output (just Spain and Italy, there were 3 in geo_json):

    [{'geometry': {'coordinates': [[[-69.99693762899992, 12.577582098000036],  
                                    [-69.99693762899992, 12.577582098000036]]],
                   'type': 'Polygon'},
      'id': 'ABW',
      'properties': {'ADMIN': 'Italy', 'ISO_A3': 'ABW'},
      'type': 'Feature'},
     {'geometry': {'coordinates': [[[-69.99693762899992, 12.577582098000036],
                                    [-69.99693762899992, 12.577582098000036]]],
                   'type': 'Polygon'},
      'id': 'ABW',
      'properties': {'ADMIN': 'Spain', 'ISO_A3': 'ABW'},
      'type': 'Feature'}]