Search code examples
rdata.tableranking

frank () form multiple columns in R


I happen to stumble upon a problem with frank(). I have a data.table with multiple columns, containing and itemID and its characteristics. Now i would like to order the items according to its characteristics: total and RD, which are both values. The higher the values of both columns, the higher the ranks. Additionally, I want to rank for each type separately.

enter image description here

testDT <- testDT[,rankRD := frank(RD), by = c("type")]
testDT <- testDT[,rankTotal := frank(total), by = c("type")]

I have build an get around:

Sumrank <- rowSums(testDT[,c("rankRD", "rankTotal")])
testDT <- cbind(testDT, Sumrank)
testDT <- testDT[,rank := frank(-Sumrank), by = c("type")]

However, this is not a very elegant solution. I was looking for a better solution, that is suitable for large data too - ideally data.table

Example data:

item <- c(rep(c("plan20, plan21, plan22"), 2), "plan23", "plan24", "plan25")
item <- c("plan20", "plan21", "plan22", "plan20", "plan21", "plan22", "plan23","plan24", "plan25")
total <- c(rep(c(3000, 5000, 7000), 2), c(5000, 5000, 2000))
RD <- c(rep(c(500, 300, 700), 2), c(500, 200, 100))
type <- c(rep("A", 3), rep("B", 3), rep("C", 3))

Solution

  • Here is an option for this equal weighted rank calculations:

    #convert type into integer type for easier joining later and
    #create an running index for RD
    testDT[order(type, RD), c("tid", "rd") := .(rleid(type), rleid(type, RD))]
    
    #create an running index for total
    testDT[order(tid, total), rt := rleid(tid, total)]
    
    #create an running index for sum of indices in descending order
    testDT[order(tid, -(rd+rt)), rr := rleid(tid, -(rd+rt))]
    
    #reset the index to start with 1 for each type
    testDT[testDT[, .(m=max(rr)), tid][, tid := tid + 1L], on=.(tid), rr := rr - m]
    

    output:

         item type total  RD rankRD rankTotal Sumrank rank tid rd rt rr
    1: plan20    A  3000 500      2       1.0     3.0  2.5   1  2  1  2
    2: plan21    A  5000 300      1       2.0     3.0  2.5   1  1  2  2
    3: plan22    A  7000 700      3       3.0     6.0  1.0   1  3  3  1
    4: plan20    B  3000 500      2       1.0     3.0  2.5   2  5  4  2
    5: plan21    B  5000 300      1       2.0     3.0  2.5   2  4  5  2
    6: plan22    B  7000 700      3       3.0     6.0  1.0   2  6  6  1
    7: plan23    C  5000 500      3       2.5     5.5  1.0   3  9  8  1
    8: plan24    C  5000 200      2       2.5     4.5  2.0   3  8  8  2
    9: plan25    C  2000 100      1       1.0     2.0  3.0   3  7  7  3
    

    Explanation:

    It sorts by type and either of the numeric column first and then create an index using rleid. Then it sums these 2 indices and repeats for the sum of those indices while giving the top rank to those with largest sum of the previous indices.