Search code examples
pythonpandasdataframemulti-indexreindex

reindex multi level index with missing categories


I have a dataframe with two indexes, group and class. I have a dictionary containing additional levels that need to be added in to both those indexes. Specifically I want to add E to the group index. And i want to ensure all g1, g2, and g3 are present in the class index, per group (so add g3 to group A, g1 to group B, g2 and g3 to group C, and g1 and g3 to group D and g1, g2 and g3 to group E. I want to fill where appropriate the total column with zeros

the original dataframe is here

df = pd.DataFrame(data={'group' : ['A','A','B','B','C','D'],
                        'class': ['g1','g2','g2','g3','g1','g2'],
                        'total' : [3,14,12,11,21,9]})

the dictionary (and mapped df) containing all required categories is here

dic = {'group':['A','B','C','D','E'],
       'class' : ['g1','g2','g3']}

and the expected output is here

expectedOutput = pd.DataFrame(data={'group' : ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'],
                        'class': ['g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3'],
                        'total' : [3,14,0, 0,12,11,21,0,0,0,9,0, 0,0,0]})

I am having a trouble maintaining the duplicated values when reindexing, but I need to keep them all. Any advice is welcome, thanks a lot


Solution

  • Solution with MultiIndex - created from dict by MultiIndex.from_product with DataFrame.reindex:

    dic = {'group':['A','B','C','D','E'],
           'class' : ['g1','g2','g3']}
    
    mux = pd.MultiIndex.from_product(dic.values(), names=dic)
    
    df = df.set_index(list(dic)).reindex(mux, fill_value=0).reset_index()
    print (df)
       group class  total
    0      A    g1      3
    1      A    g2     14
    2      A    g3      0
    3      B    g1      0
    4      B    g2     12
    5      B    g3     11
    6      C    g1     21
    7      C    g2      0
    8      C    g3      0
    9      D    g1      0
    10     D    g2      9
    11     D    g3      0
    12     E    g1      0
    13     E    g2      0
    14     E    g3      0
    

    Or with left join by DataFrame created by itertools.product:

    from  itertools import product
    
    dicDf = pd.DataFrame(product(*dic.values()), columns=dic)
    
    df = dicDf.merge(df, how='left').fillna({'total':0})
    print (df)
       group class  total
    0      A    g1    3.0
    1      A    g2   14.0
    2      A    g3    0.0
    3      B    g1    0.0
    4      B    g2   12.0
    5      B    g3   11.0
    6      C    g1   21.0
    7      C    g2    0.0
    8      C    g3    0.0
    9      D    g1    0.0
    10     D    g2    9.0
    11     D    g3    0.0
    12     E    g1    0.0
    13     E    g2    0.0
    14     E    g3    0.0