I have some data:
test <- data.frame(A=c("aaabbb",
"aaaabb",
"aaaabb",
"aaaaab",
"bbbaaa")
)
and so on. All the elements are the same length, and are already sorted before I get them.
I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:
A B
aaabbb First
aaaabb Second
aaaabb Second
aaaaab Third
bbbaaa
bbbbaa
I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.
How about this:
test$B <- match(test$A , unique(test$A)[1:3] )
test
A B
1 aaabbb 1
2 aaaabb 2
3 aaaabb 2
4 aaaaab 3
5 bbbaaa NA
6 bbbbaa NA
One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique
because you receive the data pre-sorted.
As data is sorted another suitable function worth considering is rle
, although it's slightly more obtuse in this example:
rnk <- rle(as.integer(df$A))$lengths
rnk
# [1] 1 2 1 1 1
test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )
rle
computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.
And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):
test$B <- rep(1:length(rnk),times=rnk)