Search code examples
pythonjsonpandasdataframenested

Create many new column df, having a nested column inside that df


I have a data frame that looks like this:

a = {'price': [1, 2],
     'nested_column': 
    [[{'key': 'code', 'value': 'A', 'label': 'rif1'},
    {'key': 'datemod', 'value': '31/09/2022', 'label': 'mod'}],
    [{'key': 'code', 'value': 'B', 'label': 'rif2'},
    {'key': 'datemod', 'value': '31/08/2022', 'label': 'mod'}]]}

df = pd.DataFrame(data=a)

My expected output should look like this:

b = {'price': [1, 2],
    'code':["A","B"],
    'datemod':["31/09/2022","31/08/2022"]}

exp_df = pd.DataFrame(data=b)

I tried some lines of code, that unfortunately don't do the job, that look like this:

df = pd.concat([df.drop(['nested_column'], axis=1), df['nested_column'].apply(pd.Series)], axis=1)
df = pd.concat([df.drop([0], axis=1), df[0].apply(pd.Series)], axis=1)

Solution

  • You can pop and explode your column to feed to json_normalize, then pivot according to the desired key/value and join:

    # pop the json column and explode to rows
    s = df.pop('nested_column').explode()
    
    df = df.join(pd.json_normalize(s)    # normalize dictionary to columns
                   .assign(idx=s.index)  # ensure same index
                   .pivot(index='idx', columns='key', values='value')
                 )
    

    output:

       price code     datemod
    0      1    A  31/09/2022
    1      2    B  31/08/2022