Search code examples
pythonpython-3.xgeofolium

Remove whole nested dict by finding if a value is in a list


I am trying to make a choropleth map of values within zip codes in the US. I have a json file with the points for the ZCTA5CE area that corresponds to each zip code. I am using the Folium package.

Right now the mapping works, but is painfully slow -- 10s of minutes based on other items running on my machine, and making interacting with the map by sliding and zooming nearly impossible -- because of the size of the json file (482.2M) and thus the resulting dict.

The data I want to plot does not have information for all zip codes, so I would like to remove the information in the zip code dict associated with those zip codes that are not in my data.

My question is: how can I iterate over a dict of zip code info and remove the dicts that are not in a list of zips I specify.

To be clearer about the structure of the dict I'm working with:

zip_code_geo_dict.keys() gives:

dict_keys(['type', 'features'])

where zip_code_geo_dict['type'] is a string, and zip_code_geo_dict['features'] is a list.

Now, zip_code_geo_dict['features'][0] is:

{'type': 'Feature','geometry': {'type': 'MultiPolygon',
'coordinates': [[[[-88.252618, 32.92675],
[-88.249724, 32.93242],
**bajillions of lines of coordinates here**
[-88.34043199999999, 32.991199]]]]},
'properties': {'ZCTA5CE10': '35442',
'AFFGEOID10': '8600000US35442',
'GEOID10': '35442',
'ALAND10': 610213891,
'AWATER10': 10838694}}

My source data can change, so the actual list of zip codes I want to map is dynamic. That said, I can always create a list:

zips_of_interest = ['15210', '15222'] 

How can I iterate through the zip_code_geo_dict to remove the coordinate information based on zip_code_geo_dict['features']['properties']['ZCTA5CE10'] NOT IN zips_of_interest? It is necessary to keep the over-arching dict structure, such that the filtered version zip_code_geo_dict['features'] is in the same "spot" as the original (it needs to be a dict as the second key in the larger zip_code_geo_dict object).

I think it's relevant to note that I would like to keep the basic structure of the dict because I am going to pass it to the choropleth method within Folium.


Solution

  • Not sure if this is what you're looking for. The dict you posted doesn't have a features key. I made up an additional dict that would not be removed by the logic you proposed and put both dicts in a list in order to provide a full demonstration.

    def filter_zips(geo_list, zip_list):
        result = geo_list.copy()
        for i, zip_code_geo_dict in enumerate(result):
            if zip_code_geo_dict['properties']['ZCTA5CE10'] not in zip_list:
                del result[i]
        return result
    
    zip_code_geo_list = [
        {
            'type': 'Feature',
            'geometry': {
                'type': 'MultiPolygon',
                'coordinates': [
                    [-88.252618, 32.92675],
                    [-88.249724, 32.93242],
                    [-88.34043199999999, 32.991199]
                ]
            },
            'properties': {
                'ZCTA5CE10': '35442',
                'AFFGEOID10': '8600000US35442',
                'GEOID10': '35442',
                'ALAND10': 610213891,
                'AWATER10': 10838694
            }
        },
        {
            'type': 'Feature',
            'geometry': {
                'type': 'MultiPolygon',
                'coordinates': [
                    [-88.252618, 32.92675],
                    [-88.249724, 32.93242],
                    [-88.34043199999999, 32.991199]
                ]
            },
            'properties': {
                'ZCTA5CE10': '35442',
                'AFFGEOID10': '8600000US35442',
                'GEOID10': '15210',
                'ALAND10': 610213891,
                'AWATER10': 10838694
            }
        },
    ]
    zips_of_interest = ['15210', '15222']
    
    filter_zips(zip_code_geo_list, zips_of_interest)
    

    filter_zips() in this case will return the list with the first dict removed and the second remaining.