Search code examples
pythonpandas-groupbygeojsonfolium

Matching Columns to Geojson File


I'm pretty sure that this problem has a simple solution, but I've been stuck for a while and can't seem to figure it out. Here's what I've done so far:

# import libraries
import folium
import pandas as pd
import numpy as np
import json

# import data
cases = pd.read_csv('COVID-19_Cases__Tests__and_Deaths_by_ZIP_Code.csv')

And then I rename the column I need to match a Geojson file:

cases.rename(columns = {'ZIP Code':'ZIP'}, inplace = True) 

Because the data was listed by week and I simply need the most up-to-date numbers, I sorted by Zip Code to just get the max values that I was looking for:

cases_sorted = cases.groupby('ZIP')
maximums = cases_sorted.max()

So far so good. I drop a few unnecessary rows:

maximums_cleaning = maximums.drop('60666',axis = 0)
maximums_cleaned = maximums_cleaning.drop('Unknown',axis = 0)

And my dataframe looks like this: Dataframe

I then load a map:

import folium
map = folium.Map(location=[41.8781, -87.6298], default_zoom_start=15)
map

Change the column to type String:

maximums_cleaned['ZIP']=maximums_cleaned['ZIP'].astype(str)

And then I get this error:

KeyError: 'ZIP'

And then load my GeoJson file to layer over it:

    # load GeoJson
map.choropleth(geo_data="Boundaries - ZIP Codes.geojson",
             data=maximums_cleaned, # my dataset
             columns=['ZIP', 'Case Rate - Cumulative'], # zip code is here for matching the geojson zipcode, sales price is the column that changes the color of zipcode areas
             key_on='feature.properties.postalCode', 
             fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Cases')

Again I get this error: KeyError: "None of ['ZIP'] are in the columns"

I have tried the code without converting to a string and received the same error code when loading my GeoJson file. I've also tried grouping by different columns with no success. I think the problem is that the "Zip" column is the first column and it's header is lower than the others. I think that this likely needs to be addressed for the GeoJson file to work with the data frame, but I cannot figure out how to fix it. Appreciate your input, thanks!


Solution

  • As you group by 'ZIP', it gets converted to the index of your Data Frame, and indexes are not columns, you got a confusion there.

    One solution that could work, is copying your index to a column:

    How to convert index of a pandas dataframe into a column?