Search code examples
python-3.xpandasdataframejson-normalize

How to convert a nested dict, to a pandas dataframe


I'm trying to convert a dataframe that has inside other dataframe like:

{
  'id': 3241234,
  'data': {
           'name':'carol',
           'lastname': 'netflik',
           'office': {
                       'num': 3543,
                       'department': 'trigy'
                    }
        }


}

I tried to use:

pd.DataFrame.from_dict(data)

But the result dataframe looks like:

               id                                  data
lastname  3241234                               netflik
name      3241234                                 carol
office    3241234  {'num': 3543, 'department': 'trigy'}

Any idea?


Solution

  • Loading a JSON/dict:

    import pandas as pd
    
    data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
    
    df = pd.json_normalize(data)
    
    # display(df)
            id data.name data.lastname  data.office.num data.office.department
    0  3241234     carol       netflik             3543                  trigy
    

    If the dataframe has column of dicts

    # dataframe with column of dicts
    df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
    
    # display(df)
       col2                                                                                                                col
    0     1  {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
    1     2  {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
    2     3  {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
    
    # normalize the column of dicts
    normalized = pd.json_normalize(df['col'])
    
    # join the normalized column to df
    df = df.join(normalized).drop(columns=['col'])
    
    # display(df)
       col2       id data.name data.lastname  data.office.num data.office.department
    0     1  3241234     carol       netflik             3543                  trigy
    1     2  3241234     carol       netflik             3543                  trigy
    2     3  3241234     carol       netflik             3543                  trigy
    

    If the dataframe has a column of lists with dicts

    • The dicts need to be removed from the lists with .explode
    data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
    
    df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
    
    # display(df)
       col2                                                                                                                  col
    0     1  [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
    1     2  [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
    2     3  [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
    
    # explode the lists
    df = df.explode('col', ignore_index=True)
    
    # remove and normalize the column of dicts
    normalized = pd.json_normalize(df.pop('col'))
    
    # join the normalized column to df
    df = df.join(normalized)