Search code examples
rthreshold

Filtering subjects below accuracy threshold in R


I have a data frame containing a list of subjects below a certain accuracy threshold (i.e 50% incorrect): 1. I have another data frame containing all subjects (accurate and inaccurate) with all their data. Importantly, there are multiple rows per subject in this central data frame: 2.

I need to remove the inaccurate subjects from the central data-frame in 2. How do I do this in R? I have already tried subset:

 filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject) 

'groupedmergedoutliers' is the central subject data frame ; 'filtercorrectpercent'is the inaccurate subjects data frame;


Solution

  • You are using ==, which tests for pairwise equality (e.g., is the first row of df1$subject equal to the first row of df2$subject, are the second rows equal, etc.). Consider

    c(1, 1, 2, 3) == c(1, 2, 3, 4)
    # [1] TRUE FALSE FALSE FALSE
    

    Instead, you want to be testing if each row of df1$subject is in any row of df2$subject. We can use %in% for this:

    c(1, 1, 2, 3) %in% c(1, 2, 3, 4)
    # [1] TRUE TRUE TRUE TRUE
    
    filterdata <- subset(
        groupedmergedoutliers,
        subject %in% filtercorrectpercent$subject
    )