Search code examples
pythonpython-3.xpandasdictionarymulti-index

Python Pandas - Repeat dict to fit in MultiIndex Dataframe


I have the structure of a MultiIndex Dataframe that I want to use (inspired from the documentation). I want to instantiate it at each level with a dict.

MultiIndex Dataframe general structure:

import pandas as pd

def mklbl(prefix, n):
    return ["%s%s" % (prefix, i) for i in range(n)]

miindex = pd.MultiIndex.from_product(
    [mklbl("X", 2), mklbl("Y", 2), mklbl("Z", 2)]
)

columns = ['A', 'B', 'C']

dfmi = (
    pd.DataFrame(
        # Code below from the documentation. To replace??
        # np.arange(len(miindex) * len(columns)).reshape(
        #   (len(miindex), len(columns))
        ),
        index=miindex,
        columns=columns,
    )
    .sort_index()
    .sort_index(axis=1)
)

I would like that to replace somehow the above commented code by several repetitions of the following dictionary, which fit with the MultiIndex,

my_dict = {'A': [False, False, True, True], 
           'B': [False, True, False, True], 
           'C': [0, 0, 0, 0]}

such that I have as a final result:

               A       B   C
X0 Y0 Z0   False   False   0
           False   True    0
           True    False   0
           True    True    0
      Z1   False   False   0
           False   True    0
           True    False   0
           True    True    0
   Y1 Z0   False   False   0
           False   True    0
           True    False   0
           True    True    0
      Z1   False   False   0
           False   True    0
           True    False   0
           True    True    0
X1 Y0 Z0   False   False   0
           False   True    0
           True    False   0
           True    True    0
      Z1   False   False   0
           False   True    0
           True    False   0
           True    True    0
   Y1 Z0   False   False   0
           False   True    0
           True    False   0
           True    True    0
      Z1   False   False   0
           False   True    0
           True    False   0
           True    True    0

I tried to play around with pd.concat() without success. Is it possible?


Solution

  • You can combine concat and itertools.product:

    from itertools import product
    
    prod = product(mklbl("X", 2), mklbl("Y", 2), mklbl("Z", 2))
    
    tmp = pd.DataFrame(my_dict)
    
    out = pd.concat({p: tmp for p in prod})
    

    Alternatively, if you already have dfmi, use a cross-merge:

    out = (dfmi[[]]
     .reset_index()
     .merge(pd.DataFrame(my_dict), how='cross')
    )
    
    out = (out
     .set_index(list(out)[:dfmi.index.nlevels])
     .rename_axis(index=[None]*dfmi.index.nlevels)
    )
    

    Output:

                    A      B  C
    X0 Y0 Z0 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
          Z1 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
       Y1 Z0 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
          Z1 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
    X1 Y0 Z0 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
          Z1 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
       Y1 Z0 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0
          Z1 0  False  False  0
             1  False   True  0
             2   True  False  0
             3   True   True  0