Search code examples
pythonpandasdataframegroup-bysubtraction

Find extras between two columns of two dataframes - subtract


I have 2 dataframes (df_a and df_b) with 2 columns: 'Animal' and 'Name'.

In the bigger dataframe, there are more animals of the same type than the other. How do I find the extra animals of the same type by name? i.e. (df_a - df_b)

Dataframe A

Animal  Name
dog     john
dog     henry
dog     betty
dog     smith
cat     charlie
fish    tango
lion    foxtrot
lion    lima

Dataframe B

Animal  Name
dog     john
cat     charlie
dog     betty
fish    tango
lion    foxtrot
dog     smith

In this case, the extra would be:

Animal  Name
dog     henry
lion    lima

Attempt: I tried using

df_c = df_a.subtract(df_b, axis='columns')

but got the following error "unsupported operand type(s) for -: 'unicode' and 'unicode'", which makes sense since they are strings not numbers. Is there any other way?


Solution

  • You are looking for a left_only merge.

    merged = pd.merge(df_a,df_b, how='outer', indicator=True)
    merged.loc[merged['_merge'] == 'left_only'][['Animal', 'Name']]
    

    Output

        Animal  Name
    1   dog    henry
    7   lion    lima
    

    Explanation:

    merged = pd.merge(df_a,df_b, how='outer', indicator=True)
    

    Gives:

      Animal    Name    _merge
    0   dog     john    both
    1   dog     henry   left_only
    2   dog     betty   both
    3   dog     smith   both
    4   cat     charlie both
    5   fish    tango   both
    6   lion    foxtrot both
    7   lion    lima    left_only
    

    The extra animals are in df_a only, which is denoted by left_only.