I have a the following dataframes in python:
1 2 3 4 5
dog dog 0 1 1 0 1
fox 1 0 0 0 0
jumps 0 0 0 1 0
over 1 0 1 0 1
the 0 1 0 0 0
fox dog 0 0 1 1 1
fox 0 0 0 0 0
jumps 0 0 1 0 1
over 0 1 0 0 0
the 0 0 0 1 1
jumps dog 0 0 0 0 0
fox 0 1 0 1 1
jumps 0 0 0 0 1
over 1 0 1 0 0
the 0 0 0 0 0
over dog 0 0 1 0 0
fox 0 1 0 1 1
jumps 0 0 0 0 0
over 0 1 0 1 0
the 1 0 1 0 0
the dog 0 0 1 0 0
fox 0 0 0 0 1
jumps 0 1 0 0 0
over 0 0 1 1 0
the 0 1 1 0 1
1 2 4 5
dog dog 1 0 0 0
fox 0 1 0 1
jumps 0 1 1 0
the 0 0 0 0
horse 1 0 1 0
fox dog 0 0 0 0
fox 0 1 0 1
over 0 0 0 0
the 0 1 0 1
cat 0 0 1 0
You can see that dataframe2 contains multiindexes of dataframe1 but it also contains additional multiindexes like horse and cat. Dataframe 2 also doesn't contain all the columns of dataframe 1 as you can see it misses column 3.
I want to subtract dataframe 2 from dataframe 1 in such a way that the function only subtracts the data which is common in both and ignores the rest and the resulting dataframe is in shape of dataframe 2.
Does any know if pandas provides a builtin way of doing this or do I need to construct a function myself. If so, can you point me in the right direction? Any suggestions are highly appreciated. Thank you.
NOTE: This question is similar to another question I posted here apart from the fact that I am not wanting to compare these, instead wanting to do an arithmetic operation of subtraction.
I believe you simply want something like:
In [23]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna()
Out[23]:
1 2 4 5
dog dog 1.0 -1.0 0.0 -1.0
fox -1.0 1.0 0.0 1.0
horse 1.0 0.0 1.0 0.0
jumps 0.0 1.0 0.0 0.0
the 0.0 -1.0 0.0 0.0
fox cat 0.0 0.0 1.0 0.0
dog 0.0 0.0 -1.0 -1.0
fox 0.0 1.0 0.0 1.0
over 0.0 -1.0 0.0 0.0
the 0.0 1.0 -1.0 0.0
Pandas already automatically aligns on the index, that's part of it's magic, but you just have to fill/drop nans intelligently.
Whoops, you actually want df1 - df2
, but with the shape of df2
, a little bit more tricky since then fillna(df1)
would prevent us from dropping the right rows, however, you can just use multiply by -1!
In [25]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna() * -1
Out[25]:
1 2 4 5
dog dog -1.0 1.0 -0.0 1.0
fox 1.0 -1.0 -0.0 -1.0
horse -1.0 -0.0 -1.0 -0.0
jumps -0.0 -1.0 -0.0 -0.0
the -0.0 1.0 -0.0 -0.0
fox cat -0.0 -0.0 -1.0 -0.0
dog -0.0 -0.0 1.0 1.0
fox -0.0 -1.0 -0.0 -1.0
over -0.0 1.0 -0.0 -0.0
the -0.0 -1.0 1.0 -0.0
Or, if those negative zeros bother you:
In [31]: (-df2 + df1.drop('3', axis=1)).fillna(-df2).dropna()
Out[31]:
1 2 4 5
dog dog -1.0 1.0 0.0 1.0
fox 1.0 -1.0 0.0 -1.0
horse -1.0 0.0 -1.0 0.0
jumps 0.0 -1.0 0.0 0.0
the 0.0 1.0 0.0 0.0
fox cat 0.0 0.0 -1.0 0.0
dog 0.0 0.0 1.0 1.0
fox 0.0 -1.0 0.0 -1.0
over 0.0 1.0 0.0 0.0
the 0.0 -1.0 1.0 0.0