Search code examples
pythondictionarypandasdataframemulti-index

Nested dictionary to multiindex dataframe where dictionary keys are column labels


Say I have a dictionary that looks like this:

dictionary = {'A' : {'a': [1,2,3,4,5],
                     'b': [6,7,8,9,1]},

              'B' : {'a': [2,3,4,5,6],
                     'b': [7,8,9,1,2]}}

and I want a dataframe that looks something like this:

     A   B
     a b a b
  0  1 6 2 7
  1  2 7 3 8
  2  3 8 4 9
  3  4 9 5 1
  4  5 1 6 2

Is there a convenient way to do this? If I try:

In [99]:

DataFrame(dictionary)

Out[99]:
     A               B
a   [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b   [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]

I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.


Solution

  • Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:

    >>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
    >>> reform
    {('A', 'a'): [1, 2, 3, 4, 5],
     ('A', 'b'): [6, 7, 8, 9, 1],
     ('B', 'a'): [2, 3, 4, 5, 6],
     ('B', 'b'): [7, 8, 9, 1, 2]}
    >>> pandas.DataFrame(reform)
       A     B   
       a  b  a  b
    0  1  6  2  7
    1  2  7  3  8
    2  3  8  4  9
    3  4  9  5  1
    4  5  1  6  2
    
    [5 rows x 4 columns]