Search code examples
rdataframeunique

Select groups based on number of unique / distinct values


I have a data frame like below

sample <- data.frame(ID = 1:9,
                     Group = c('AA','AA','AA','BB','BB','CC','CC','BB','CC'),
                     Value = c(1,1,1,2,2,2,3,2,3))

ID       Group    Value
1        AA       1
2        AA       1
3        AA       1
4        BB       2
5        BB       2
6        CC       2
7        CC       3
8        BB       2
9        CC       3

I want to select groups according to the number of distinct (unique) values within each group. For example, select groups where all values within the group are the same (one distinct value per group). If you look at the group CC, it has more than one distinct value (2 and 3) and should thus be removed. The other groups, with only one distinct value, should be kept. Desired output:

ID       Group    Value
1        AA       1
2        AA       1
3        AA       1
4        BB       2
5        BB       2
8        BB       2

Would you tell me simple and fast code in R that solves the problem?


Solution

  • You can make a selector for sample using ave many different ways.

    sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]
    

    or

    sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]
    

    or

    sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]