Search code examples
pythonpandassetoperation

Take two sets 'A' and 'B' , how to create a set C= A minus B in pandas


Consider I have two sets 'A' and 'B', how do I create a set C= A minus B in pandas. Here A and B are dataframes. A is a dataframe containing First name and Last name as multiindex. B has integers as index. First name and Last name are columns in B.

I tried by converting multindex of A to column of A by A['index']=A.index and later tried to merge B and A.But it is not working.

A:

csv for A.csv

B:

csv for B.csv

The columns of B (f_Name and l_name) are multiindex of A.

I want all rows in A for which f_name and l_name does not exist in B as output. I have tried following code:

A['index']=A.index

my_df=pd.merge(A,B,left_on=['F_name','L_name'],right_index=True,how='left'] 

ans_df=A[~A.index.isin(my_df.index)]

but the len(and_df) is same as len(A) which is not correct. The length of ans_df should be less than that of A as few f_name and l_name exist in B.


Solution

  • Here are the dataframes A and B

    import pandas as pd
    import numpy as np
    
    A
                   Age  Gender
    F_name  L_name      
    Josh    Crammer 25  M
    John    Smith   29  M
    Mellisa Simpson 32  F
    Ahemed  Khan    26  M
    Frank   J       25  M
    Charles Brown   26  M
    William Gibson  26  M
    
    B
        F_name  L_name
    0   Josh    Crammer
    2   Mellisa Simpson
    4   Frank   J
    5   Charles Brown
    6   William Gibson
    

    What we can do is reset the index of A and create columns in place like this.

    A.reset_index(level=A.index.names, inplace=True)
    A
        F_name  L_name  Age Gender
    0   Josh    Crammer 25  M
    1   John    Smith   29  M
    2   Mellisa Simpson 32  F
    3   Ahemed  Khan    26  M
    4   Frank   J       25  M
    5   Charles Brown   26  M
    6   William Gibson  26  M
    

    All that needs to be done now is to add a not in condition to fetch the rows we require:

    A[~((A.F_name.isin(B.F_name)) & (A.L_name.isin(B.L_name)))]
        F_name  L_name  Age Gender
    1   John    Smith   29  M
    3   Ahemed  Khan    26  M