Search code examples
rranking

Add a column of ranks


I have some data:

test <- data.frame(A=c("aaabbb",
"aaaabb",
"aaaabb",
"aaaaab",
"bbbaaa")
)

and so on. All the elements are the same length, and are already sorted before I get them.

I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:

   A       B
 aaabbb  First
 aaaabb  Second
 aaaabb  Second
 aaaaab  Third
 bbbaaa
 bbbbaa  

I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.


Solution

  • How about this:

    test$B <- match(test$A , unique(test$A)[1:3] )
    test
           A  B
    1 aaabbb  1
    2 aaaabb  2
    3 aaaabb  2
    4 aaaaab  3
    5 bbbaaa NA
    6 bbbbaa NA
    

    One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique because you receive the data pre-sorted.

    As data is sorted another suitable function worth considering is rle, although it's slightly more obtuse in this example:

    rnk <- rle(as.integer(df$A))$lengths
    rnk
    # [1] 1 2 1 1 1
    test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )
    

    rle computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.

    And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):

    test$B <- rep(1:length(rnk),times=rnk)