Search code examples
rfiltersubset

R filter or subset based on list of values in two rows


I have a large dataset where I want to filter out rows with matching column values in another dataframe that is of a different length.

As a simple example:

df1
 A  B      date
 1  3 3-10-2022
 1  2 3-10-2022
 2  2  3-5-2022
 3 NA  4-5-2022
 3  2  4-5-2022
 4 NA  4-5-2022

df2
  A     date2
 1 3-10-2022
 2 4-10-2022
 3  4-5-2022

The goal is to exclude rows in df1 where column values both A and date match e.g. df1$A = df2$A AND df1$date = df2$date2 such that my new data frame is:

 Desired results
df3
 A  B      date
 2  2  3-5-2022
 4 NA  4-5-2022

I have tried the following but have found that my results do not appropriately exclude rows. I also get the error message of "longer object length is not a multiple of shorter object length" and am wondering if this is the issue.

df3 <- df1[!(df1$A == df2$A & df1$date == df2$date),]

Incorrect results:
df3
   A  B      date
  1  2 3-10-2022
  2  2  3-5-2022
  3 NA  4-5-2022
  3 2  4-5-2022
  4 NA  4-5-2022

The issue appears to be with rows where A includes duplicated value and/or the row contains an NA value, it is incorrectly retained. Can you please advise?


Solution

  • OK. Try "%in%" instead of "==".

    (df3 <- df1[!(df1$A %in% df2$A & df1$date %in% df2$date),])
    
      A  B     date
    3 2  2 3-5-2022
    6 4 NA 4-5-2022
    

    Because df2$A is a vector.