Search code examples
rdata-cleaning

How to use condition to get rid of rows in R


I have a dataframe like this:

Person Test  
1 new  
1 new  
1 old  
1 old  
2 new  
2 new  
2 old

and I want to get rid of the rows with unequal numbers of test on the new system and the old system. In this case, person 2 is tested 2 times on new and 1 time on old, so I want to get rid of all his data (the last three rows.) How do I achieve this with a large dataset?


Solution

  • You can count the frequency of each unique value for each person with table and select the groups where the count is the same for all unique values.

    This can be done in base R :

    subset(df, ave(Test, Person, FUN = function(x) length(unique(table(x)))) == 1)
    
    #  Person Test
    #1      1  new
    #2      1  new
    #3      1  old
    #4      1  old
    

    dplyr

    library(dplyr)
    df %>% group_by(Person) %>% filter(n_distinct(table(Test)) == 1)
    

    and data.table :

    library(data.table)
    setDT(df)[,.SD[uniqueN(table(Test)) == 1], Person]