Search code examples
pythonpandasconditional-statementscalculated-columns

Extract values from dictionary and conditionally assign them to columns in pandas


I am trying to extract values from a column of dictionaries in pandas and assign them to their respective columns that already exist. I have hardcoded an example below of the data set that I have:

df_have = pd.DataFrame(
{
    'value_column':[np.nan, np.nan, np.nan]
    ,'date':[np.nan, np.nan, np.nan]
    ,'string_column':[np.nan, np.nan, np.nan]
    , 'dict':[[{'value_column':40},{'date':'2017-08-01'}],[{'value_column':30}, 
{'string_column':'abc'}],[{'value_column':10},{'date':'2016-12-01'}]]
})

df_have

df_want = pd.DataFrame(
    {
        'value_column':[40, 30, 10]
        ,'date':['2017-08-01', np.nan, '2016-12-01']
        ,'string_column':[np.nan, 'abc', np.nan]
        ,'dict':[[{'value_column':40},{'date':'2017-08-01'}],[{'value_column':30}, 
{'string_column':'abc'}],[{'value_column':10},{'date':'2016-12-01'}]]})
df_want

I have managed to extract the values out of the dictionaries using loops:

'''

for row in range(len(df_have)):
    row_holder = df_have.dict[row]
    number_of_dictionaries_in_the_row = len(row_holder)
    
    for dictionary in range(number_of_dictionaries_in_the_row):
        variable_holder = df_have.dict[row][dictionary].keys()
        variable = list(variable_holder)[0]
        value = df_have.dict[row][dictionary].get(variable) 

'''

I now need to somehow conditionally turn df_have into df_want. I am happy to take a completely new approach and recreate the whole thing from scratch. We could even assume that I only have a dataframe with the dictionaries and nothing else.


Solution

  • You could use pandas string methods to pull the data out, although I think it is inefficient nesting data structures within Pandas :

    df_have.loc[:, "value_column"] = df_have["dict"].str.get(0).str.get("value_column")
    df_have.loc[:, "date"] = df_have["dict"].str.get(-1).str.get("date")
    df_have.loc[:, "string_column"] = df_have["dict"].str.get(-1).str.get("string_column")
    
    
     value_column   date       string_column    dict
    0   40        2017-08-01    None    [{'value_column': 40}, {'date': '2017-08-01'}]
    1   30        None          abc     [{'value_column': 30}, {'string_column': 'abc'}]
    2   10        2016-12-01    None    [{'value_column': 10}, {'date': '2016-12-01'}]