Search code examples
pythonpandasdataframenested-lists

Pandas DataFrame from nested list of dict


I'm recieving data from facebook api to list of dictionaries. I want to create pandas DataFrame from this list.

data = [{'account_id': '1234', 'account_name': 'account1', 'adset_id': '238', 'adset_name': 'Audio-Video', 'impressions': '24140', 'actions': [{'action_type': 'onsite_conversion.post_save', 'value': '1'}, {'action_type': 'link_click', 'value': '1081'}], 'date_start': '2022-11-01', 'date_stop': '2022-11-01'},
{'account_id': '1234', 'account_name': 'account2', 'adset_id': '591', 'adset_name': 'Moto', 'impressions': '16657', 'actions': [{'action_type': 'link_click', 'value': '620'}, {'action_type': 'post', 'value': '1'}], 'date_start': '2022-11-01', 'date_stop': '2022-11-01'} ]

But one of the column (actions) is nested. I want to "unpack" this to new columns.

I'm trying

df = pandas.DataFrame.from_dict(res)

The result is:

  account_id account_name adset_id   adset_name impressions                                                                                                                  actions  date_start   date_stop
0       1234     account1      238  Audio-Video       24140  [{'action_type': 'onsite_conversion.post_save', 'value': '1'}, {'action_type': 'link_click', 'value': '1081'}]  2022-11-01  2022-11-01
1       1234     account2      591         Moto       16657  [{'action_type': 'link_click', 'value': '620'}, {'action_type': 'post', 'value': '1'}]                          2022-11-01  2022-11-01

But the desired result should be:

  account_id account_name adset_id   adset_name impressions  onsite_conversion.post_save  link_click         post  date_start   date_stop
0       1234     account1      238  Audio-Video       24140                            1        1081            0  2022-11-01  2022-11-01
1       1234     account2      591         Moto       16657                            0         620            1  2022-11-01  2022-11-01

What is the most effective way to do it with pandas? Thank you!


Solution

  • IIUC, you can try to json_normalize and pivot:

    df = pd.DataFrame(data)
    
    s = df.pop('actions').explode()
    
    df = df.join(pd
       .json_normalize(s).set_axis(s.index)
       .reset_index()
       .pivot_table(index='index', columns='action_type', values='value', fill_value=0)
     )
    
    print(df)
    

    Output:

      account_id account_name adset_id   adset_name impressions  date_start  \
    0       1234     account1      238  Audio-Video       24140  2022-11-01   
    1       1234     account2      591         Moto       16657  2022-11-01   
    
        date_stop  link_click  onsite_conversion.post_save  post  
    0  2022-11-01        1081                            1     0  
    1  2022-11-01         620                            0     1