Search code examples
pandasdataframeunionnanmulti-index

fill in missing DataFrame indices


Given two pandas dataframes dfa and dfb, how can I ensure the MultiIndex of each DataFrame contains all rows from the other?

In [147]: dfa
Out[147]: 
        c
a b      
0 5  10.0
1 6  11.0
2 7  12.0
3 8  13.5
4 9  14.0

In [148]: dfb
Out[148]: 
      c
a b    
0 5  10
2 7  12
3 8  13
4 9  14

Here, dfb lacks index (1, 6):

In [149]: dfa - dfb
Out[149]: 
       c
a b     
0 5  0.0
1 6  NaN
2 7  0.0
3 8  0.5
4 9  0.0

... but dfa may also lack indices from dfb. The value should be 0 where we insert a missing index in each dataframe.

In other words, each DataFrame's index should be the union of the two MultiIndexes, where the added row gets a value of 0.


Solution

  • I think you need DataFrame.sub with parameter fill_value if need replace NaN to some value:

    df = dfa.sub(dfb, fill_value=0)
    print (df)
            c
    a b      
    0 5   0.0
    1 6  11.0
    2 7   0.0
    3 8   0.5
    4 9   0.0
    
    df = dfb.sub(dfa, fill_value=0)
    print (df)
          c
    a b    
    0 5  10
    1 6   0
    2 7  12
    3 8  13
    4 9  14
    

    Or if need union of indexes add reindex:

    mux = dfa.index.union(dfb.index)
    print (mux)
    MultiIndex(levels=[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]],
               labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
               names=['a', 'b'],
               sortorder=0)
    
    print (dfa.reindex(mux, fill_value=0))
            c
    a b      
    0 5  10.0
    1 6  11.0
    2 7  12.0
    3 8  13.5
    4 9  14.0
    
    print (dfb.reindex(mux, fill_value=0))
          c
    a b    
    0 5  10
    1 6   0
    2 7  12
    3 8  13
    4 9  14