Search code examples
pythonpandasdataframedictionarynested

Pandas: Filter rows on value in dictionaries in Series


Some columns in my dataframe consist of dictionaries themselves, like this dataframe:

df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
                   'Age': [25, 30, 35],
                   'Location': [
                      {'City': 'Seattle', 'State': 'WA'}, 
                      {'City': 'New York', 'State': 'NY'}, 
                      {'City': 'Albany', 'State': 'NY'}
                    ]
                    })

df
    Name    Age Location
0   Alice   25  {'City': 'Seattle', 'State': 'WA'}
1   Bob     30  {'City': 'New York', 'State': 'NY'}
2   Aritra  35  {'City': 'Albany', 'State': 'NY'}

How can I filter the dataframe on a value in that dictionary?

When I just want one value, I can do this:

df['Location'][0]['State'] 
'WA'

But the issue is that the index is necessary in between the column name and the dictionary key. Thus something like df[df['Location']['State'] == 'NY'] to select all people from NY won't work.

Is there a way to include any index, or must this be done otherwise?

The desired output is

    Name    Age Location
1   Bob     30  {'City': 'New York', 'State': 'NY'}
2   Aritra  35  {'City': 'Albany', 'State': 'NY'}

Solution

  • Use str.get to access the dictionary key and boolean indexing:

    out = df[df['Location'].str.get('State').eq('NY')]
    

    Alternatively, with a list comprehension:

    out = df[[d.get('State')=='NY' for d in df['Location']]]
    

    Output:

         Name  Age                             Location
    1     Bob   30  {'City': 'New York', 'State': 'NY'}
    2  Aritra   35    {'City': 'Albany', 'State': 'NY'}