Search code examples
rdata.tablefrank

frank - specifying multiple columns from data.table in R


I am quote confused about the frank function. The documentation here says:

Only for lists, data.frames and data.tables. The columns to calculate ranks based on. Do not quote column names. If ... is missing, all columns are considered by default. To sort by a column in descending order prefix a "-", e.g., frank(x, a, -b, c). The -b works when b is of type character as well.

so i have my data:

structure(list(product = c("Product 1", "Product 1", "Product 1", 
                           "Product 1", "Product 1", "Product 5", "Product 5", "Product 5", 
                           "Product 5", "Product 5"), policyID = c("A738-33", "A738-33", 
                                                                   "A738-33", "A738-33", "A738-33", "A738-33", "A738-33", 
                                                                   "A738-33", "A738-33", "A738-33"), startYear = c(2014, 
                                                                                                                               2015, 2016, 2017, 2018, 2014, 2015, 2016, 2017, 2018), total = c("30000", 
                                                                                                                                                                                                     "30000", "30000", "30000", "30000", "10000", "10000", "10000", 
                                                                                                                                                                                                     "10000", "10000"), daily = c("150", "150", "150", "150", "150", 
                                                                                                                                                                                                                                     "80", "80", "80", "80", "80")), class = c("data.table", "data.frame"
                                                                                                                                                                                                                                     ), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x7feec50126e0>, sorted = "product")

I want to order this data by columns total and daily. So i have done this:

> setDT(testDT)
> frankv(testDT, totallimit, rbddaily, ties.method="dense")
Error in colnamesInt(x, cols, check_dups = TRUE) : 
  argument specifying columns specify non existing column(s): cols[1]='30000'

strangely enough, when i DO use quotations, exactly the opposite what the documentatoin says, I am getting results:

frankv(testDT, cols=c("totallimit", "rbddaily"), ties.method="dense")

I also tried to integrating thin into data.table, and another weird thing happened. From the 10 rows of data I had, i obtained a 100 rows.

testDT[,.(rank = frankv(testDT, cols=c("limit", "daily"), ties.method="dense")), by = c("policyID", "product", "startYear")]

What am i doing wrong and how can i fix this? The documentation is not of much help, maybe i am missing something...


Solution

  • For frank you should not quote, but for frankv (the function you used) you should:

    library(data.table)
    frank(testDT, total, daily, ties.method="dense")
    
     [1] 2 2 2 2 2 1 1 1 1 1
    
    frankv(testDT, cols=c("total", "daily"), ties.method="dense")
    
     [1] 2 2 2 2 2 1 1 1 1 1