Search code examples
pythonpandasapinested-fields

Extract nested values from data frame using python


I've extracted the data from API response and created a dictionary function:

def data_from_api(a):
    dictionary = dict(
    data = a['number']
    ,created_by = a['opened_by']
    ,assigned_to = a['assigned']
    ,closed_by = a['closed']
    )
    return dictionary

and then to df (around 1k records):

raw_data = []
for k in data['resultsData']:
    records = data_from_api(k)
    raw_data.append(records)

I would like to create a function allows to extract the nested fields {display_value} in the columns in the dataframe. I need only the names like John Snow, etc. Please see below:

enter image description here

How to create a function extracts the display values for those fields? I've tried to create something like:

df = pd.DataFrame.from_records(raw_data)

def get_nested_fields(nested):
if isinstance(nested, dict):
    return nested['display_value']
else:
    return ''

df['created_by'] = df['opened_by'].apply(get_nested_fields)
df['assigned_to'] = df['assigned'].apply(get_nested_fields)
df['closed_by'] = df['closed'].apply(get_nested_fields)

but I'm getting an error:

KeyError: 'created_by'

Could you please help me?


Solution

  • You can use .str and get() like below. If the key isn't there, it'll write None.

    df = pd.DataFrame({'data':[1234, 5678, 5656], 'created_by':[{'display_value':'John Snow', 'link':'a.com'}, {'display_value':'John Dow'}, {'my_value':'Jane Doe'}]})
    df['author'] = df['created_by'].str.get('display_value')
    

    output

       data                                       created_by     author
    0  1234  {'display_value': 'John Snow', 'link': 'a.com'}  John Snow
    1  5678                    {'display_value': 'John Dow'}   John Dow
    2  5656                         {'my_value': 'Jane Doe'}       None