I have a data frame called x1:
FID g1 g2 g3
0 19 20 13
1 16 11 14
2 15 20 11
and a data frame called x2:
FID g1
1 16
I want to alter x1 so that it does not include the row in x2:
FID g1 g2 g3
0 19 20 13
2 15 20 11
I've tried:
x1 = pd.concat([x1,x2]).drop_duplicates(keep=False)
but think this only works if the data frames have matching schemas. Can I just keep rows in x1 that don't share the FID values in x2?
You can use pd.Series.isin
to create a boolean series of the values in the FID
column of x1
that appear in the FID
column of x2
.
Then simply use pd.DataFrame.loc
with the operator ~
to invert the boolean series and select the rows of x1
whose values in the FID
column don't appear in the FID
column of x2
:
cond = x1.FID.isin(x2.FID)
x1.loc[~cond]
# output:
FID g1 g2 g3
0 0 19 20 13
2 2 15 20 11