Search code examples
rcountduplicatesdata.tableunique

data.table - removing all groups with unequal count in R


I want to group my data.table by ID and remove all groups where the count does not have the same value across the years.

Maybe something along the lines that if ID and count combination is not unique, remove all rows with that ID?

I am looking for a solution suitable for large data.table

DATA:

# example data
ID <- c(rep("A", 5), rep("B", 6), rep("C", 2), rep("D", 3), rep("E", 4))
count <- c(rep(3, 5), rep(4, 6), rep(1, 2), c(1,3,3), rep(1, 4))
year <- as.numeric(c(rep(c(2012, 2013, 2014, 2015), 4), 2012, 2013, 2015, 2016))
mydata <- cbind(ID, year, count)
mydata <- as.data.table(mydata)
mydata <- setorder(unique(mydata))

Solution

  • Not sure this is the most elegant solution, maybe somebody has a better one. In the meantime:

    mydata[, k := length(unique(count)), by=ID][k==1][,k:=NULL]
    

    EDIT: Copying from the link above, the proper solution:

    mydata[ , if(uniqueN(count) == 1) .SD, by = ID]