Search code examples
pythonpandasdataframedifference

Pandas drop rows in one dataframe that share a common value with a rows in a column of another dataframe


I have a data frame called x1:

 FID  g1    g2    g3
  0   19    20    13
  1   16    11    14
  2   15    20    11

and a data frame called x2:

 FID  g1   
  1   16  

I want to alter x1 so that it does not include the row in x2:

 FID  g1    g2    g3
  0   19    20    13
  2   15    20    11

I've tried:

x1 = pd.concat([x1,x2]).drop_duplicates(keep=False)

but think this only works if the data frames have matching schemas. Can I just keep rows in x1 that don't share the FID values in x2?


Solution

  • You can use pd.Series.isin to create a boolean series of the values in the FID column of x1 that appear in the FID column of x2.

    Then simply use pd.DataFrame.loc with the operator ~ to invert the boolean series and select the rows of x1 whose values in the FID column don't appear in the FID column of x2:

    cond = x1.FID.isin(x2.FID)
    x1.loc[~cond] 
    # output:
       FID  g1  g2  g3
    0    0  19  20  13
    2    2  15  20  11