Given this data frame:
names <- c("Anna", "Bella", "Christian", "Derrick", "Emma")
scores <- c(10,5,10,9,8)
age <- c(16,16,17,18,21)
test <- data.frame(cbind(names,scores, age))
I wish to create a variable that ranks by scores and uses names as a tie-breaker i.e. though Anna and Christian both score 10, Anna's rank == 1 & Christian's == 2
my code: test$rank_by_score <- order(test$scores, test$names, decreasing = T)
current output:
names scores age rank_by_score
Anna 10 16 4
Bella 5 16 5
Christian 10 17 2
Derrick 9 18 3
Emma 8 21 1
desired output:
names scores age rank_by_score
Anna 10 16 1
Bella 5 16 5
Christian 10 17 2
Derrick 9 18 3
Emma 8 21 4
What's happening in my current output and how do I get to my desired output ?
editing to show output when age and scores are coded as integers rather than factors
names scores age rank_by_score
Anna 10 16 3
Bella 5 16 1
Christian 10 17 4
Derrick 9 18 5
Emma 8 21 2
I think you are looking for rank
rather than order
but rank
can take only one column value. So we can first order
the data based on names
and then use rank
.
test <- test[order(test$names), ]
rank(-test$scores, ties.method = "first")
#[1] 1 5 2 3 4
See ?rank
for different ties.method
options. If we use ties.method = "first"
when there is a tie small number is given to the entry which occurs first and opposite when ties.method = "last"
.
rank(-test$scores, ties.method = "last")
#[1] 2 5 1 3 4
order
returns indices of the original vector in sorted order.
a1 <- order(test$scores, decreasing = TRUE)
a1
#[1] 1 3 4 5 2
a2 <- test$scores
a2
#[1] 10 5 10 9 8
Here, the output of order
can be interpreted as a2[a1[1]]
(10) is the biggest number followed by a2[a1[2]]
(10) and a2[a1[3]]
(9) and so on.
data
names <- c("Anna", "Bella", "Christian", "Derrick", "Emma")
scores <- c(10,5,10,9,8)
age <- c(16,16,17,18,21)
test <- data.frame(names, scores, age)