Search code examples
pythonpandasdataframedictionaryunpack

How to unpack dictionaries in rows pandas dataframe


I have a dataframe in pandas that looks something like this:

id   column_name_1                         column_name_2                          column_name_3        
1    {acct_nr_1: value, acct_nr_2:value}   {acct_nr_1: value, acct_nr_2:value}    {acct_nr_1: value, acct_nr_2:value}    
2    {acct_nr_1: value, acct_nr_2:value}   {acct_nr_1: value, acct_nr_2:value}    {acct_nr_1: value, acct_nr_2:value}       
3    {acct_nr_1: value, acct_nr_2:value}   {acct_nr_1: value, acct_nr_2:value}    {acct_nr_1: value, acct_nr_2:value}       
4    {acct_nr_1: value, acct_nr_2:value}   {acct_nr_1: value, acct_nr_2:value}    {acct_nr_1: value, acct_nr_2:value} 

Some dictionaries have more than two account numbers, and the values differ. What I want to do is get a dataframe with the same columns plus one. The new column should have the account number (if I only have in the dictionary) and the same if there is more than one account but broken down into several rows. And in each column I want to get only the values.

So the new dataframe should be as follows:

id   account_number        column_name_1   column_name_2   column_name_3        
1    acct_nr_1             value           value           value
1    acct_nr_2             value           value           value
2    acct_nr_1             value           value           value          
2    acct_nr_2             value           value           value
3    acct_nr_1             value           value           value    
3    acct_nr_2             value           value           value
4    acct_nr_1             value           value           value
4    acct_nr_2             value           value           value

I couldn't find a simple (or any) way to solve it.

Thanks!


Solution

  • Stack the df to convert to mutliindex series with two levels, then create a new dataframe from the dictionaries, stack the new df to convert to multindex series with three levels, now unstack on level 1 to reshape back into desired format

    s = df.set_index('id').stack()
    
    (pd.DataFrame([*s], index=s.index)
      .stack()
      .unstack(1)
      .rename_axis(['id', 'account_number'])
      .reset_index())
    

       id account_number column_name_1 column_name_2 column_name_3
    0   1      acct_nr_1         value         value         value
    1   1      acct_nr_2         value         value         value
    2   2      acct_nr_1         value         value         value
    3   2      acct_nr_2         value         value         value
    4   3      acct_nr_1         value         value         value
    5   3      acct_nr_2         value         value         value
    6   4      acct_nr_1         value         value         value
    7   4      acct_nr_2         value         value         value