Search code examples
pythonpandasplotlyjson-normalize

How to Create id for Mapping with plotly.express


I have a dataframe "states" that has each states's child poverty rate and json file called "us_states". I want to create a choropleth map using plotly express but I'm struggling to create the id column. Here is my entire code.

import pandas as pd
import json
import plotly.express as px

states = pd.read_csv('https://raw.githubusercontent.com/ngpsu22/Child-Poverty-State-Map/master/poverty_rate_map.csv')

us_states = pd.read_json('https://github.com/ngpsu22/Child-Poverty-State-Map/raw/master/gz_2010_us_040_00_500k.json')

state_id_map = {}
for feature in us_states['features']:
  feature['id'] = feature['properties']['NAME']
  state_id_map[feature['properties']['STATE']] = feature['id']

states['id'] = states['state'].apply(lambda x: state_id_map[x])

But I get this error: KeyError: 'Maine' Which since Maine is first in my data frame means that something is going wrong.

Any suggestions?


Solution

    • us_states.features is a dict
    • Use pd.json_normalize to extract the dict into a dataframe.
    • 'geometry.coordinates' for each row is a large nested list
    • It's not clear what the loop is supposed to do, the data from the two dataframes can be joined together for easier access, using pd.merge.
    us_states = pd.read_json('https://github.com/ngpsu22/Child-Poverty-State-Map/raw/master/gz_2010_us_040_00_500k.json')
    
    # convert the dict to dataframe
    us_states_features = pd.json_normalize(us_states.features, sep='_')
    
    # the Name column is addressed with
    us_states_features['properties_Name']
    
    # join the two dataframe into one
    df = pd.merge(states, us_states_features, left_on='state', right_on='properties_NAME')