Given two pandas dataframes dfa
and dfb
, how can I ensure the MultiIndex of each DataFrame contains all rows from the other?
In [147]: dfa
Out[147]:
c
a b
0 5 10.0
1 6 11.0
2 7 12.0
3 8 13.5
4 9 14.0
In [148]: dfb
Out[148]:
c
a b
0 5 10
2 7 12
3 8 13
4 9 14
Here, dfb
lacks index (1, 6):
In [149]: dfa - dfb
Out[149]:
c
a b
0 5 0.0
1 6 NaN
2 7 0.0
3 8 0.5
4 9 0.0
... but dfa
may also lack indices from dfb
. The value should be 0
where we insert a missing index in each dataframe.
In other words, each DataFrame's index should be the union of the two MultiIndexes, where the added row gets a value of 0.
I think you need DataFrame.sub
with parameter fill_value
if need replace NaN
to some value:
df = dfa.sub(dfb, fill_value=0)
print (df)
c
a b
0 5 0.0
1 6 11.0
2 7 0.0
3 8 0.5
4 9 0.0
df = dfb.sub(dfa, fill_value=0)
print (df)
c
a b
0 5 10
1 6 0
2 7 12
3 8 13
4 9 14
Or if need union
of indexes add reindex
:
mux = dfa.index.union(dfb.index)
print (mux)
MultiIndex(levels=[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]],
labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
names=['a', 'b'],
sortorder=0)
print (dfa.reindex(mux, fill_value=0))
c
a b
0 5 10.0
1 6 11.0
2 7 12.0
3 8 13.5
4 9 14.0
print (dfb.reindex(mux, fill_value=0))
c
a b
0 5 10
1 6 0
2 7 12
3 8 13
4 9 14