Search code examples
pythonpandasdictionarydataframedefaultdict

Concatenating iterable of defaultdicts into DataFrame


Simplified example of what I have now:

from collections import defaultdict

d1 = defaultdict(list)
d2 = defaultdict(list)

d1['a'] = [1, 2, 3]
d1['b'] = [True, True, True]

d2['a'] = [4, 5 , 6]
d2['b'] = [False, False, False]

Desired result:

   a      b
0  1   True
1  2   True
2  3   True
3  4  False
4  5  False
5  6  False

This line below will work, but I'm looking for an alternative that doesn't have to instantiate a separate DataFrame for every defaultdict.

pd.concat([pd.DataFrame(d) for d in (d1, d2)]).reset_index(drop=True)

Could also start with:

pd.DataFrame([d1, d2])

and convert that to long format.


Solution

  • You could merge the dicts and then instantiate your dataframe.

    d3 = {k : d1[k] + d2[k] for k in d1}
    d3
    {'a': [1, 2, 3, 4, 5, 6], 'b': [True, True, True, False, False, False]}
    
    df = pd.DataFrame(d3)
    df
       a      b
    0  1   True
    1  2   True
    2  3   True
    3  4  False
    4  5  False
    5  6  False
    

    Automating the merge for multiple objects:

    d3 = defaultdict(list)
    for d in dict_list:
        for k in d:
            d3[k].extend(d[k])
    
    df = pd.DataFrame(d3)