I have 2 simple data frames
df1
colA colB
0 e5b1b9fc-ade9-4501-a66b-ef2ecd57483e.d9967f258... 2ZWR52QYZ86H
1 8d127d82-cfa4-421f-9081-cf35132b8248.f0865b3b9... 61RPLMR5BFFT
2 005c8e84-98b4-402d-a24e-6a63edad0598.16b6f0f9f... 7L256IQTB1M1
3 d87f6dfd-1c55-4ce5-9b84-e80b6aa958d8.3f0901c7f... 3H9SLNATBJ01
4 cf89c9dd-004e-40e7-8120-3397ce5fd97e.f571bc175... 4Z8RT5VZNOQ8
5 9eebc606-e8d0-40e3-9ba5-6d3e1b77bc64.0dc42d528... 1DEOAHZL2JFC
6 7112aef1-5fa0-4459-aa1b-15cba2f96ec5.6a9ecb28d... 2CIISYGAAV69
7 e30d901c-34e6-4974-9b9e-1fe206ed6fca.701f1358e... 2NLLJ70RXKW2
8 13677989-8979-4422-a471-7fda22ea4f6d.e00051e45... 6P60G721DVHK
df2
0 e5b1b9fc-ade9-4501-a66b-ef2ecd57483e.d9967f258... 2ZWR52QYZ86H
1 8d127d82-cfa4-421f-9081-cf35132b8248.f0865b3b9... 61RPLMR5BFFT
2 005c8e84-98b4-402d-a24e-6a63edad0598.16b6f0f9f... 7L256IQTB1M1
3 d87f6dfd-1c55-4ce5-9b84-e80b6aa958d8.3f0901c7f... 3H9SLNATBJ01
4 cf89c9dd-004e-40e7-8120-3397ce5fd97e.f571bc175... 4Z8RT5VZNOQ8
now, i want to isolate the rows in df1, that are not in df2
so i tried
df1.subtract(df2)
but i get
result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for -: 'str' and 'str'
What am i doing wrong?
You can use merge
to get the rows in df1
that are not in df2
.
Using indicator
parameter to check which rows are found in both data frames, only in the left data frame, or only in the right data frame.
merged_df = df1.merge(df2, on=['colA', 'colB'], how='outer', indicator=True)
result_df = merged_df.loc[merged_df['_merge'] == 'left_only']
Example output:
colA | colB | _merge | |
---|---|---|---|
5 | 9eebc606-e8d0-40e3-9ba5-6d3e1b77bc64.0dc42d528 | 1DEOAHZL2JFC | left_only |
6 | 7112aef1-5fa0-4459-aa1b-15cba2f96ec5.6a9ecb28d | 2CIISYGAAV69 | left_only |
7 | e30d901c-34e6-4974-9b9e-1fe206ed6fca.701f1358e | 2NLLJ70RXKW2 | left_only |
8 | 13677989-8979-4422-a471-7fda22ea4f6d.e00051e45 | 6P60G721DVHK | left_only |
About your comment:
this worked for me
concat_df = pd.concat([df1,df2]).drop_duplicates(keep=False, inplace=False)
That option is only valid if df2 doesn't have rows that aren't in df1 (All rows in df2 are in df1). If both df1 and df2 have unique rows, using pd.concat() and dropping duplicates won't only isolate the rows in df1
that are not in df2
as you requested in your question, you would have all rows unique in df2 too