Search code examples
pythonpython-3.xpandasdataframepandas-explode

Expand embedded list of dictionnaries in a DataFrame as new columns of the DataFrame


I have a Pandas DataFrame looking like:

import pandas as pd
print(pd.__version__)

df0 = pd.DataFrame([
 [12, None, [{'dst': '925', 'object': 'Lok. Certification', 'admin': 'valid'},
             {'dst': '935', 'object': 'Lok. Administration', 'admin': 'true'},
             {'dst': '944', 'object': 'Lok. Customer', 'admin': 'false'},
             {'dst': '945', 'object': 'Lok. Customer', 'admin': 'false'},
             {'dst': '954', 'object': 'Lok. Certification-C', 'admin': 'invalid'},
             {'dst': '956', 'object': 'Lok. Certification', 'admin': 'valid'}]],
 [13,'wXB', [{'dst': '986', 'object': 'Fral_heater', 'admin': 'valid'},
             {'dst': '987', 'object': 'Fral_cond.', 'admin': 'valid'}]],
 ])

Each of the list in the column 2 is having the exact same keys (dst, object and admin).

There can be between 0 (empty []) and 100 lists for each row of the df0.

I wish I could expand the df0 DataFrame to look like this:

columns = ['id', 'name', 'dst', 'object', 'admin']

df_wanted
Out[416]: 
     id name  dst  object                admin
    12  None  925 'Lok. Certification'   'valid'
    12  None  935 'Lok. Administration'  'true'
    12  None  944 'Lok. Customer'        'false'
    12  None  945 'Lok. Customer'        'false'
    12  None  955 'Lok. Certification-C' 'invalid'
    12  None  956 'Lok. Certification'   'valid'
    13   wXB  987 'Lok. Fral_heater'     'valid'
    13   wXB  986 'Lok. Fral_cond.'      'valid'
    ...

Notice that the two first columns, id and name, are replicated along the rows to fit the number of elements within their list.

(The dst column must be cast to an int using .astype(int) at the end.)

How could I achieve that?

Info:

Python 3.10.4
pd.__version__
'1.4.2'

Solution

  • You can explode the column first, then convert the dictionaries to columns:

    df0 = df0.explode(2, ignore_index=True)    
    df0 = pd.concat([df0, df0[2].apply(pd.Series)], axis=1).drop(columns=2)