I happen to stumble upon a problem with frank(). I have a data.table
with multiple columns, containing and itemID and its characteristics. Now i would like to order the items according to its characteristics: total
and RD
, which are both values. The higher the values of both columns, the higher the ranks. Additionally, I want to rank for each type
separately.
testDT <- testDT[,rankRD := frank(RD), by = c("type")]
testDT <- testDT[,rankTotal := frank(total), by = c("type")]
I have build an get around:
Sumrank <- rowSums(testDT[,c("rankRD", "rankTotal")])
testDT <- cbind(testDT, Sumrank)
testDT <- testDT[,rank := frank(-Sumrank), by = c("type")]
However, this is not a very elegant solution. I was looking for a better solution, that is suitable for large data too - ideally data.table
Example data:
item <- c(rep(c("plan20, plan21, plan22"), 2), "plan23", "plan24", "plan25")
item <- c("plan20", "plan21", "plan22", "plan20", "plan21", "plan22", "plan23","plan24", "plan25")
total <- c(rep(c(3000, 5000, 7000), 2), c(5000, 5000, 2000))
RD <- c(rep(c(500, 300, 700), 2), c(500, 200, 100))
type <- c(rep("A", 3), rep("B", 3), rep("C", 3))
Here is an option for this equal weighted rank calculations:
#convert type into integer type for easier joining later and
#create an running index for RD
testDT[order(type, RD), c("tid", "rd") := .(rleid(type), rleid(type, RD))]
#create an running index for total
testDT[order(tid, total), rt := rleid(tid, total)]
#create an running index for sum of indices in descending order
testDT[order(tid, -(rd+rt)), rr := rleid(tid, -(rd+rt))]
#reset the index to start with 1 for each type
testDT[testDT[, .(m=max(rr)), tid][, tid := tid + 1L], on=.(tid), rr := rr - m]
output:
item type total RD rankRD rankTotal Sumrank rank tid rd rt rr
1: plan20 A 3000 500 2 1.0 3.0 2.5 1 2 1 2
2: plan21 A 5000 300 1 2.0 3.0 2.5 1 1 2 2
3: plan22 A 7000 700 3 3.0 6.0 1.0 1 3 3 1
4: plan20 B 3000 500 2 1.0 3.0 2.5 2 5 4 2
5: plan21 B 5000 300 1 2.0 3.0 2.5 2 4 5 2
6: plan22 B 7000 700 3 3.0 6.0 1.0 2 6 6 1
7: plan23 C 5000 500 3 2.5 5.5 1.0 3 9 8 1
8: plan24 C 5000 200 2 2.5 4.5 2.0 3 8 8 2
9: plan25 C 2000 100 1 1.0 2.0 3.0 3 7 7 3
Explanation:
It sorts by type
and either of the numeric column first and then create an index using rleid
. Then it sums these 2 indices and repeats for the sum of those indices while giving the top rank to those with largest sum of the previous indices.