Search code examples
pythonpandaslistunpack

Unpack list of dictionary into separate columns in Pandas


Let's assume I have data which is structured like such:

{ 
    "_id" : 245, 
    "connId" : "3r34b32", 
    "roomList" : [
        {
            "reportId" : 29, 
            "siteId" : 1
        }]
}

How do I go about getting the keys from the dictionary within the list, along with the value in order for it to have this expected output below.

Expected Output:

ID,connID,reportId,siteId

245,3r34b32,29,1

I'm using json_normalize but this does not work with lists, so I need to add an extra bit of code which can unpack the list that contains a dictionary.

Any help will be appreciated.


Solution

  • You have a nested record. You can handle them separately with record_path and them concatenate them with pd.concat()

    root = pd.json_normalize(d).drop('roomList',1)
    nested = pd.json_normalize(d, record_path='roomList')
    output = pd.concat([root, nested],axis=1)
    print(output)
    
       _id   connId  reportId  siteId
    0  245  3r34b32        29       1
    

    Method without explicit column names

    If you don't want to specify a column name explicitly you can do the following. This checks which of the columns after normalization are nested, separately normalizes them and concatenates all of them. I have added another nested JSON for sake of example -

    root = pd.json_normalize(d)
    nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
    
    l = [root.drop(nested_cols,1),]
    for i in nested_cols:
        l.append(pd.json_normalize(d, record_path=i))
    
    output = pd.concat(l, axis=1)
    print(output)
    
       _id   connId  reportId  siteId  reportId2  siteId2
    0  245  3r34b32        29       1         39        5
    

    NOTE: If some of your nested records have same names, you many want to use some prefix while normalizing them. Check this for more details.