Search code examples
rexcelbreakrank

How to break the rank of a variable into decimals according to the rank of a second variable in R, in order to break ties


How, in R, break ties from V1 according to the rank of V2, using the excel rank-based method, as described on youtube https://www.youtube.com/watch?v=_fD5vhb36j0

More precisely, the question is not ordering, but rather to adjust the rank of V1 (by 'adjust', I mean 'split into decimals') according to that of V2, in order to adjust integer ties from V1 to the ascending rank of V2.

Let assume these initial data, given as example:

dat0 <- data.frame(V1=c(18,20,21,22,22,23,23,23,23,24,24,24,25,25,25,25,25,26,26,27),
V2=c(13,54,3,7,23,10,13,24,25,5,9,16,6,8,11,13,18,9,10,9))

The goal to reach is:

> dat0
   V1 V2 rank_V1 tiebreak rank_V1_adjusted V1_adjusted
1  18 13       1     0.00             1.00       18.00
2  20 54       2     0.00             2.00       20.00
3  21  3       3     0.00             3.00       21.00
4  22  7       4     0.20             4.20       22.20
5  22 23       4     0.85             4.85       22.85
6  23 10       6     0.45             6.45       23.45
7  23 13       6     0.60             6.60       23.60
8  23 24       6     0.90             6.90       23.90
9  23 25       6     0.95             6.95       23.95
10 24  5      10     0.10            10.10       24.10
11 24  9      10     0.30            10.30       24.30
12 24 16      10     0.75            10.75       24.75
13 25  6      13     0.15            13.15       25.15
14 25  8      13     0.25            13.25       25.25
15 25 11      13     0.55            13.55       25.55
16 25 13      13     0.60            13.60       25.60
17 25 18      13     0.80            13.80       25.80
18 26  9      18     0.30            18.30       26.30
19 26 10      18     0.45            18.45       26.45
20 27  9      20     0.00            20.00       27.00

The excel formulaes based on the youtube tutos are:

V1 (column A): Vector V1
V2 (column B): vector V2
rank_V1 (column C): =RANG(A2;$A$2:$A$21;1)  
tiebreak (column D): =SI(NB.SI($C$2:$C$21;C2)>1;RANG(B2;$B$2:$B$21;1)/NB(B:B);0)  
rank_V1_adjusted (column E): =C2+D2  
V1_adjusted (column F): =SI(D2<>0;A2+D2;A2)

Whatever the way, a function that would do this would be really useful. Thanks for your help


Solution

  • It's pretty straightforward if you know dplyr:

    library(dplyr)
    
    dat0 <- data.frame(V1=c(18,20,21,22,22,23,23,23,23,24,24,24,25,25,25,25,25,26,26,27),
                       V2=c(13,54,3,7,23,10,13,24,25,5,9,16,6,8,11,13,18,9,10,9))
    
    dat0 <- dat0 %>%
      arrange(V1, V2) %>%
      mutate(rank_V1 = rank(V1, ties="min"),
             tiebreak = rank(V2, ties="min")/n()) %>%
      group_by(V1) %>%
      mutate(tiebreak = case_when(n() == 1 ~ 0, T ~ tiebreak)) %>%
      mutate(rank_V1_adjusted = rank_V1 + tiebreak,
             V1_adjusted = V1 + tiebreak) %>%
      as.data.frame
             
    
       V1 V2 rank_V1 tiebreak rank_V1_adjusted V1_adjusted
    1  18 13       1     0.00             1.00       18.00
    2  20 54       2     0.00             2.00       20.00
    3  21  3       3     0.00             3.00       21.00
    4  22  7       4     0.20             4.20       22.20
    5  22 23       4     0.85             4.85       22.85
    6  23 10       6     0.45             6.45       23.45
    7  23 13       6     0.60             6.60       23.60
    8  23 24       6     0.90             6.90       23.90
    9  23 25       6     0.95             6.95       23.95
    10 24  5      10     0.10            10.10       24.10
    11 24  9      10     0.30            10.30       24.30
    12 24 16      10     0.75            10.75       24.75
    13 25  6      13     0.15            13.15       25.15
    14 25  8      13     0.25            13.25       25.25
    15 25 11      13     0.55            13.55       25.55
    16 25 13      13     0.60            13.60       25.60
    17 25 18      13     0.80            13.80       25.80
    18 26  9      18     0.30            18.30       26.30
    19 26 10      18     0.45            18.45       26.45
    20 27  9      20     0.00            20.00       27.00