Search code examples
rcharacteraggregater-factor

R - aggregate factor/character variable


I have this kind of data frame :

df<- data.frame(cluster=c('1','1','2','3','3','3'), class=c('A','B','C','B','B','C'))

I would like to get for each cluster (1,2,3), the class which appears the most often. In case of a tie, it would also be great to get an info, as for example the combination of the classes (or if not possible just have NA). So for my example, I would like to have something like this as result:

 cluster  class.max
   1        'A B' (or NA)
   2         'C'
   3         'B'

Maybe I should use aggregate() but don't know how.


Solution

  • rank has ways of dealing with ties:

    aggregate(class~cluster,df,function(x) paste(names(table(x)[rank(-1*table(x),ties.method="min")==1]),collapse=" "))
      cluster class
    1       1   A B
    2       2     C
    3       3     B