I have a large dataset where I want to filter out rows with matching column values in another dataframe that is of a different length.
As a simple example:
df1
A B date
1 3 3-10-2022
1 2 3-10-2022
2 2 3-5-2022
3 NA 4-5-2022
3 2 4-5-2022
4 NA 4-5-2022
df2
A date2
1 3-10-2022
2 4-10-2022
3 4-5-2022
The goal is to exclude rows in df1 where column values both A and date match e.g. df1$A = df2$A AND df1$date = df2$date2 such that my new data frame is:
Desired results
df3
A B date
2 2 3-5-2022
4 NA 4-5-2022
I have tried the following but have found that my results do not appropriately exclude rows. I also get the error message of "longer object length is not a multiple of shorter object length" and am wondering if this is the issue.
df3 <- df1[!(df1$A == df2$A & df1$date == df2$date),]
Incorrect results:
df3
A B date
1 2 3-10-2022
2 2 3-5-2022
3 NA 4-5-2022
3 2 4-5-2022
4 NA 4-5-2022
The issue appears to be with rows where A includes duplicated value and/or the row contains an NA value, it is incorrectly retained. Can you please advise?
OK. Try "%in%" instead of "==".
(df3 <- df1[!(df1$A %in% df2$A & df1$date %in% df2$date),])
A B date
3 2 2 3-5-2022
6 4 NA 4-5-2022
Because df2$A is a vector.