I have 2 dataframes (df_a and df_b) with 2 columns: 'Animal' and 'Name'.
In the bigger dataframe, there are more animals of the same type than the other. How do I find the extra animals of the same type by name? i.e. (df_a - df_b)
Dataframe A
Animal Name
dog john
dog henry
dog betty
dog smith
cat charlie
fish tango
lion foxtrot
lion lima
Dataframe B
Animal Name
dog john
cat charlie
dog betty
fish tango
lion foxtrot
dog smith
In this case, the extra would be:
Animal Name
dog henry
lion lima
Attempt: I tried using
df_c = df_a.subtract(df_b, axis='columns')
but got the following error "unsupported operand type(s) for -: 'unicode' and 'unicode'", which makes sense since they are strings not numbers. Is there any other way?
You are looking for a left_only
merge.
merged = pd.merge(df_a,df_b, how='outer', indicator=True)
merged.loc[merged['_merge'] == 'left_only'][['Animal', 'Name']]
Output
Animal Name
1 dog henry
7 lion lima
Explanation:
merged = pd.merge(df_a,df_b, how='outer', indicator=True)
Gives:
Animal Name _merge
0 dog john both
1 dog henry left_only
2 dog betty both
3 dog smith both
4 cat charlie both
5 fish tango both
6 lion foxtrot both
7 lion lima left_only
The extra animals are in df_a
only, which is denoted by left_only
.