Surely there has to be a function out there in some package for this?
I've searched and I've found this function to calculate the mode:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
But I'd like a function that lets me easily calculate the 2nd/3rd/4th/nth most common value in a column of data.
Ultimately I will apply this function to a large number of dplyr::group_by()
s.
Thank you for your help!
Maybe you could try
f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])
This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.
Another method is to based on table()
:
g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))
But this is not recommended, as input vector x
will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.
Example
set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12 7 9 8 13 10 14 5 15 6 2 3 16
Let's compare with the contingency table from table
:
tab <- sort(table(x), decreasing = TRUE)
# 11 12 7 9 8 13 10 14 5 15 6 2 3 16
# 14 14 11 11 10 10 9 7 5 4 2 1 1 1
as.numeric(names(tab))
# [1] 11 12 7 9 8 13 10 14 5 15 6 2 3 16
So the results are the same.