I want to group my data.table by ID and remove all groups where the count
does not have the same value across the years.
Maybe something along the lines that if ID and count combination is not unique, remove all rows with that ID?
I am looking for a solution suitable for large data.table
DATA:
# example data
ID <- c(rep("A", 5), rep("B", 6), rep("C", 2), rep("D", 3), rep("E", 4))
count <- c(rep(3, 5), rep(4, 6), rep(1, 2), c(1,3,3), rep(1, 4))
year <- as.numeric(c(rep(c(2012, 2013, 2014, 2015), 4), 2012, 2013, 2015, 2016))
mydata <- cbind(ID, year, count)
mydata <- as.data.table(mydata)
mydata <- setorder(unique(mydata))
Not sure this is the most elegant solution, maybe somebody has a better one. In the meantime:
mydata[, k := length(unique(count)), by=ID][k==1][,k:=NULL]
EDIT: Copying from the link above, the proper solution:
mydata[ , if(uniqueN(count) == 1) .SD, by = ID]