I have this kind of data frame :
df<- data.frame(cluster=c('1','1','2','3','3','3'), class=c('A','B','C','B','B','C'))
I would like to get for each cluster (1,2,3), the class which appears the most often. In case of a tie, it would also be great to get an info, as for example the combination of the classes (or if not possible just have NA). So for my example, I would like to have something like this as result:
cluster class.max
1 'A B' (or NA)
2 'C'
3 'B'
Maybe I should use aggregate()
but don't know how.
rank
has ways of dealing with ties:
aggregate(class~cluster,df,function(x) paste(names(table(x)[rank(-1*table(x),ties.method="min")==1]),collapse=" "))
cluster class
1 1 A B
2 2 C
3 3 B