Search code examples
rfunctionmode

Calculating the mode or 2nd/3rd/4th most common value


Surely there has to be a function out there in some package for this?

I've searched and I've found this function to calculate the mode:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

But I'd like a function that lets me easily calculate the 2nd/3rd/4th/nth most common value in a column of data.

Ultimately I will apply this function to a large number of dplyr::group_by()s.

Thank you for your help!


Solution

  • Maybe you could try

    f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])
    

    This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.

    Another method is to based on table():

    g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))
    

    But this is not recommended, as input vector x will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.


    Example

    set.seed(0); x <- rpois(100, 10)
    f(x)
    # [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16
    

    Let's compare with the contingency table from table:

    tab <- sort(table(x), decreasing = TRUE)
    # 11 12  7  9  8 13 10 14  5 15  6  2  3 16 
    # 14 14 11 11 10 10  9  7  5  4  2  1  1  1
    
    as.numeric(names(tab))
    # [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16
    

    So the results are the same.