Search code examples
rranking

understanding output of "order" function in R


Given this data frame:

names <- c("Anna", "Bella", "Christian", "Derrick", "Emma")
scores <- c(10,5,10,9,8)
age <- c(16,16,17,18,21)
test <- data.frame(cbind(names,scores, age))

I wish to create a variable that ranks by scores and uses names as a tie-breaker i.e. though Anna and Christian both score 10, Anna's rank == 1 & Christian's == 2

my code: test$rank_by_score <- order(test$scores, test$names, decreasing = T)

current output:

names      scores   age   rank_by_score
Anna       10       16    4
Bella      5        16    5
Christian  10       17    2
Derrick    9        18    3
Emma       8        21    1

desired output:

names      scores   age   rank_by_score
Anna       10       16    1
Bella      5        16    5
Christian  10       17    2
Derrick    9        18    3
Emma       8        21    4

What's happening in my current output and how do I get to my desired output ?

editing to show output when age and scores are coded as integers rather than factors

names      scores   age   rank_by_score
Anna       10       16    3
Bella      5        16    1
Christian  10       17    4
Derrick    9        18    5
Emma       8        21    2

Solution

  • I think you are looking for rank rather than order but rank can take only one column value. So we can first order the data based on names and then use rank.

    test <- test[order(test$names), ]
    
    rank(-test$scores, ties.method = "first")
    #[1] 1 5 2 3 4
    

    See ?rank for different ties.method options. If we use ties.method = "first" when there is a tie small number is given to the entry which occurs first and opposite when ties.method = "last".

    rank(-test$scores, ties.method = "last")
    #[1] 2 5 1 3 4
    

    order returns indices of the original vector in sorted order.

    a1 <- order(test$scores, decreasing = TRUE)
    a1
    #[1] 1 3 4 5 2
    
    a2 <- test$scores
    a2
    #[1] 10  5 10  9  8
    

    Here, the output of order can be interpreted as a2[a1[1]] (10) is the biggest number followed by a2[a1[2]] (10) and a2[a1[3]] (9) and so on.

    data

    names <- c("Anna", "Bella", "Christian", "Derrick", "Emma")
    scores <- c(10,5,10,9,8)
    age <- c(16,16,17,18,21)
    test <- data.frame(names, scores, age)