Search code examples
python-3.xpandaspandas-explode

How to explode pandas dataframe columns and create new column with the key


I have pandas daraframe where I have multiple column which has list of dictionary. Ex:

col1  col2   combine_col1                 combine_col2
val1  val2   [{'x1':'v1','x2':'v2'}]      [{'x3':'v3','x4':'v4','x5':'v5'}]
val11 val22  [{'x1':'v11','x2':'v22'}]    [{'x3':'v33','x4':'v44','x5':'v55'}]

Inside the combined column the number of keys and their name I don't know.It can be anything.

I want to explode those specific columns which has list of dict and get all the keys as new columns. For this example the output dataframe column names should be col1, col2, x1, x2,x3,x4,x5. Currently I am hardcoding the combine cols and then doing explode operation. But I want that to happen automatically.


Solution

  • If there are only one element lists like in sample data use json_normalize with str[0] for select them:

    import ast
    
    cols = ['combine_col1','combine_col2']
    
    #if necessary
    #df[cols] = df[cols].applymap(ast.literal_eval)
    
    df1 = (df.drop(cols, axis=1)
             .join(pd.concat([pd.json_normalize(df[x].str[0]) for x in cols], axis=1)))
    print (df1)
        col1   col2   x1   x2   x3   x4   x5
    0   val1   val2   v1   v2   v3   v4   v5
    1  val11  val22  v11  v22  v33  v44  v55
    

    EDIT: Solution with append prefix:

    df1 = (df.drop(cols, axis=1)
             .join(pd.concat([pd.json_normalize(df[x].str[0]).add_prefix(f'{x}.') 
                              for x in cols], axis=1)))
    print (df1)
        col1   col2 combine_col1.x1 combine_col1.x2 combine_col2.x3  \
    0   val1   val2              v1              v2              v3   
    1  val11  val22             v11             v22             v33   
    
      combine_col2.x4 combine_col2.x5  
    0              v4              v5  
    1             v44             v55  
    

    If possible multiple values per lists use Series.explode:

    df1 = (df.drop(cols, axis=1)
             .join(pd.concat([pd.json_normalize(df[x].explode()) for x in cols], axis=1)))
    

    df1 = (df.drop(cols, axis=1)
             .join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.') 
                              for x in cols], axis=1)))