Search code examples
rcountfrequencydata-manipulationdata-management

Create a variable capturing the most frequent occurence by group


Define:

df1 <-data.frame(
id=c(rep(1,3),rep(2,3)),
v1=as.character(c("a","b","b",rep("c",3)))
)

s.t.

> df1
  id v1
1  1  a
2  1  b
3  1  b
4  2  c
5  2  c
6  2  c

I want to create a third variable freq that contains the most frequent observation in v1 by id s.t.

> df2
  id v1 freq
1  1  a    b
2  1  b    b
3  1  b    b
4  2  c    c
5  2  c    c
6  2  c    c

Solution

  • You can do this using ddply and a custom function to pick out the most frequent value:

    myFun <- function(x){
        tbl <- table(x$v1)
        x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x))
        x
    }
    
    ddply(df1,.(id),.fun=myFun)
    

    Note that which.max will return the first occurrence of the maximum value, in the case of ties. See ??which.is.max in the nnet package for an option that breaks ties randomly.