I have a data frame containing a list of subjects below a certain accuracy threshold (i.e 50% incorrect): 1. I have another data frame containing all subjects (accurate and inaccurate) with all their data. Importantly, there are multiple rows per subject in this central data frame: 2.
I need to remove the inaccurate subjects from the central data-frame in 2. How do I do this in R? I have already tried subset:
filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject)
'groupedmergedoutliers' is the central subject data frame ; 'filtercorrectpercent'is the inaccurate subjects data frame;
You are using ==
, which tests for pairwise equality (e.g., is the first row of df1$subject
equal to the first row of df2$subject
, are the second rows equal, etc.). Consider
c(1, 1, 2, 3) == c(1, 2, 3, 4)
# [1] TRUE FALSE FALSE FALSE
Instead, you want to be testing if each row of df1$subject
is in any row of df2$subject
. We can use %in%
for this:
c(1, 1, 2, 3) %in% c(1, 2, 3, 4)
# [1] TRUE TRUE TRUE TRUE
filterdata <- subset(
groupedmergedoutliers,
subject %in% filtercorrectpercent$subject
)