Is there a way in R to use the rank function (or something similar) with multiple criteria and a ties.method?
Normally rank is used to rank values in a vector and if there are ties you can use one of the ties methods ("average", "random", "first", ...). But when ranking a column in a matrix, I would like to use multiple columns and one of the ties methods.
A minimal example:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 4, 5, 5, 2, 8 ,8, 1,3, 3)
z <- c(0.2, 0.8, 0.5, 0.4, 0.2, 0.1, 0.1, 0.7, 0.3, 0.3)
m <- cbind(x=x,y=y, z=z)
Imagine I want to rank the y
-values in the above matrix. But if there are ties, I want the function to look at the z
-values. If there still are ties after that, then I want to use the ties.method = "random"
In other words, a possible outcome could be:
x y z
[1,] 1 1 0.2
[2,] 8 1 0.7
[3,] 5 2 0.2
[4,] 9 3 0.3
[5,] 10 3 0.3
[6,] 2 4 0.8
[7,] 4 5 0.4
[8,] 3 5 0.5
[9,] 6 8 0.1
[10,] 7 8 0.1
But it could also be this:
x y z
[1,] 1 1 0.2
[2,] 8 1 0.7
[3,] 5 2 0.2
[4,] 10 3 0.3
[5,] 9 3 0.3
[6,] 2 4 0.8
[7,] 4 5 0.4
[8,] 3 5 0.5
[9,] 7 8 0.1
[10,] 6 8 0.1
Notice how the fourth and the fifth row are different (just as the ninth and the tenth). The above outcome I've been able to get with the order
-function (i.e. m[order(m[,2], m[,3], sample(length(x))),]
, but I'd like to receive the rank-values, not the indices of a sorted matrix.
If you need elaboration on why I need the rank-values, feel free to ask and I'll edit the question with extra details. For now I think the minimal example will do.
EDIT: Changed dataframe to matrix as @alistaire pointed out.
Since order(order(x))
gives the same result as rank(x)
(see Why does order(order(x)) equal rank(x) in R?), you could just do
order(order(y, z, runif(length(y))))
to get the rank values.
Here's a more involved approach that allows you to use methods from ties.method
. It requires dplyr
rank2 <- function(df, key1, key2, ties.method) {
average <- function(x) mean(x)
random <- function(x) sample(x, length(x))
df$r <- order(order(df[[key1]], df[[key2]]))
group_by_(df, key1, key2) %>% mutate(rr = get(ties.method)(r))
rank2(df, "y", "z", "average")
# Source: local data frame [10 x 5]
# Groups: y, z [8]
# x y z r rr
# <dbl> <dbl> <dbl> <int> <dbl>
# 1 1 1 0.2 1 1.0
# 2 2 4 0.8 6 6.0
# 3 3 5 0.5 8 8.0
# 4 4 5 0.4 7 7.0
# 5 5 2 0.2 3 3.0
# 6 6 8 0.1 9 9.5
# 7 7 8 0.1 10 9.5
# 8 8 1 0.7 2 2.0
# 9 9 3 0.3 4 4.5
# 10 10 3 0.3 5 4.5