Some columns in my dataframe consist of dictionaries themselves, like this dataframe:
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
'Age': [25, 30, 35],
'Location': [
{'City': 'Seattle', 'State': 'WA'},
{'City': 'New York', 'State': 'NY'},
{'City': 'Albany', 'State': 'NY'}
]
})
df
Name Age Location
0 Alice 25 {'City': 'Seattle', 'State': 'WA'}
1 Bob 30 {'City': 'New York', 'State': 'NY'}
2 Aritra 35 {'City': 'Albany', 'State': 'NY'}
How can I filter the dataframe on a value in that dictionary?
When I just want one value, I can do this:
df['Location'][0]['State']
'WA'
But the issue is that the index is necessary in between the column name and the dictionary key. Thus something like df[df['Location']['State'] == 'NY']
to select all people from NY won't work.
Is there a way to include any index, or must this be done otherwise?
The desired output is
Name Age Location
1 Bob 30 {'City': 'New York', 'State': 'NY'}
2 Aritra 35 {'City': 'Albany', 'State': 'NY'}
Use str.get
to access the dictionary key and boolean indexing:
out = df[df['Location'].str.get('State').eq('NY')]
Alternatively, with a list comprehension:
out = df[[d.get('State')=='NY' for d in df['Location']]]
Output:
Name Age Location
1 Bob 30 {'City': 'New York', 'State': 'NY'}
2 Aritra 35 {'City': 'Albany', 'State': 'NY'}