Search code examples
pythonpandasgroup-bypandas-explode

Python - How to extract values ​from different list of dictionaries in the rows of the dataframe


I have data frame as like df:

    id        features                             
    100     [{'city': 'Rio'},  {'destination': '2'}]        
    110     [{'city': 'Sao Paulo'}]     
    135     [{'city': 'Recife'}, {'destination': '45'}]
    145     [{'city': 'Munich'}, {'destination': '67'}]
    167     [{'city': 'Berlin'}, {'latitude':'56'}, {'longitude':'30'}]

I have to extract column name and values from features column to separate columns as like:

    id      city          destination   latitude       longitude                            
    100     'Rio'          '2'            NaN            NaN  
    110     'Sao Paulo'    NaN            NaN            NaN  
    135     'Recife'       '45'           NaN            NaN  
    145     'Munich'       '67'           NaN            NaN  
    167     'Berlin'       NaN            '56'           '30'

I tried to do it with usage idea as like:

1st method to extract:

df = df.explode('features').reset_index(drop = True)
result = pd.concat([df.drop(columns='features'), 
           pd.json_normalize(df['features'])], axis=1)

result is only id column.

2nd method:

df = df.explode('features').reset_index(drop = True)
df2 = df.set_index('id')
df2 = df2['features'].astype('str')
df2 = df2.apply(lambda x: ast.literal_eval(x))
df2 = df2.apply(pd.Series)
result = df2.reset_index()

result is very closed what I need:

    id      city          destination        latitude      longitude                             
    100     'Rio'           NaN                NaN            NaN  
    100      NaN            '2'                NaN            NaN  
    110     'Sao Paulo'     NaN                NaN            NaN  
    135     'Recife'        NaN                NaN            NaN  
    135      NaN            '45'               NaN            NaN  
    145     'Munich'        NaN                NaN            NaN  
    145     'Munich'        '67'               NaN            NaN  
    167     'Berlin'        NaN                NaN            NaN 
    167      NaN            NaN                '56'           NaN  
    167      NaN            NaN                NaN          '30' 

How is possible to achieve an expected result in view of:

    id      city          destination   latitude       longitude                            
    100     'Rio'          '2'            NaN            NaN  
    110     'Sao Paulo'    NaN            NaN            NaN  
    135     'Recife'       '45'           NaN            NaN  
    145     'Munich'       '67'           NaN            NaN  
    167     'Berlin'       NaN            '56'           '30'

Thanks


Solution

  • Try:

    df = pd.concat(
        [
            df,
            df.pop("features").apply(
                lambda x: pd.Series({k: v for d in x for k, v in d.items()})
            ),
        ],
        axis=1,
    )
    print(df)
    

    Prints:

        id       city destination latitude longitude
    0  100        Rio           2      NaN       NaN
    1  110  Sao Paulo         NaN      NaN       NaN
    2  135     Recife          45      NaN       NaN
    3  145     Munich          67      NaN       NaN
    4  167     Berlin         NaN       56        30