Consider I have two sets 'A' and 'B', how do I create a set C= A minus B in pandas. Here A and B are dataframes. A is a dataframe containing First name and Last name as multiindex. B has integers as index. First name and Last name are columns in B.
I tried by converting multindex of A to column of A by A['index']=A.index and later tried to merge B and A.But it is not working.
A:
B:
The columns of B (f_Name and l_name) are multiindex of A.
I want all rows in A for which f_name and l_name does not exist in B as output. I have tried following code:
A['index']=A.index
my_df=pd.merge(A,B,left_on=['F_name','L_name'],right_index=True,how='left']
ans_df=A[~A.index.isin(my_df.index)]
but the len(and_df) is same as len(A) which is not correct. The length of ans_df should be less than that of A as few f_name and l_name exist in B.
Here are the dataframes A and B
import pandas as pd
import numpy as np
A
Age Gender
F_name L_name
Josh Crammer 25 M
John Smith 29 M
Mellisa Simpson 32 F
Ahemed Khan 26 M
Frank J 25 M
Charles Brown 26 M
William Gibson 26 M
B
F_name L_name
0 Josh Crammer
2 Mellisa Simpson
4 Frank J
5 Charles Brown
6 William Gibson
What we can do is reset the index of A and create columns in place like this.
A.reset_index(level=A.index.names, inplace=True)
A
F_name L_name Age Gender
0 Josh Crammer 25 M
1 John Smith 29 M
2 Mellisa Simpson 32 F
3 Ahemed Khan 26 M
4 Frank J 25 M
5 Charles Brown 26 M
6 William Gibson 26 M
All that needs to be done now is to add a not in condition to fetch the rows we require:
A[~((A.F_name.isin(B.F_name)) & (A.L_name.isin(B.L_name)))]
F_name L_name Age Gender
1 John Smith 29 M
3 Ahemed Khan 26 M