Search code examples
pythonpython-3.xpandasdataframedata-conversion

Convert a nested dictionary with lists of dictionaries to a Pandas DataFrame


I have a dictionary like this:

{
    'person1' : {
        'category1' : [
            {'time' : time1, 'property1' : value1, 'property2' : value2, ...},
            {'time' : time2, 'property1' : value3, 'property2' : value4, ...},
            ...
        ],
        'category2' : [
            {...}, 
            ...
        ],
        ...
    },
    'person2' : {
        'category1' : [
            {'time' : time3, 'property1' : value5, 'property2' : value6, ...},
            {'time' : time4, 'property1' : value7, 'property2' : value8, ...},
        ],
        'category2' : [
            {...},
            ...
        ],
        ...
    }
}

that I'd like to convert to a DataFrame like this:

'time'    'person'    'category'    'property1'    'property2'  ...  'propertyn'
--------------------------------------------------------------------------------
 time1     person1     category1      value1         value2     ...     value
 time2     person1     category1      value3         value4     ...     value
  ...      person1     category2       ...            ...       ...      ...
                                        .
                                        .
                                        .
 time3     person2     category1      value5         value6     ...     value
 time4     person2     category1      value7         value8     ...     value
  ...      person2     category2       ...            ...       ...      ...
                                        .
                                        .
                                        .
 timen     personn     category1      valuen         valuen     ...     value

I thought about using traversing through the dictionary with for-loops, but I'd like to avoid them to make this conversion as efficient as possible. I've also looked into pd.DataFrame.from_dict(), but it's not enough.

This answer is the closest I've seen, but their inner lists are just lists of values instead of a list of dictionaries like mine.

I would appreciate any help with this conversion!


Solution

  • You can create a small dataframe for each person/category and concat them:

    pd.concat([pd.DataFrame(u).assign(category=c,person=p) for p,data in d.items() for c,u in data.items()])
    

    Output:

        time property1 property2   category   person
    0  time1    value1    value2  category1  person1
    1  time2    value3    value4  category1  person1
    0  time3    value5    value6  category1  person2
    1  time4    value7    value8  category1  person2
    0  time3    value5    value6  category2  person2
    1  time4    value7    value8  category2  person2