Filtering subjects below accuracy threshold in R

I have a data frame containing a list of subjects below a certain accuracy threshold (i.e 50% incorrect): 1. I have another data frame containing all subjects (accurate and inaccurate) with all their data. Importantly, there are multiple rows per subject in this central data frame: 2.

I need to remove the inaccurate subjects from the central data-frame in 2. How do I do this in R? I have already tried subset:

 filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject)

'groupedmergedoutliers' is the central subject data frame ; 'filtercorrectpercent'is the inaccurate subjects data frame;

Solution

You are using ==, which tests for pairwise equality (e.g., is the first row of df1$subject equal to the first row of df2$subject, are the second rows equal, etc.). Consider

c(1, 1, 2, 3) == c(1, 2, 3, 4)
# [1] TRUE FALSE FALSE FALSE

Instead, you want to be testing if each row of df1$subject is in any row of df2$subject. We can use %in% for this:

c(1, 1, 2, 3) %in% c(1, 2, 3, 4)
# [1] TRUE TRUE TRUE TRUE

filterdata <- subset(
    groupedmergedoutliers,
    subject %in% filtercorrectpercent$subject
)