I want to rank values within groups using two attributes. In theory frank
in data.table
should be able to do this, but I am running into a snag where the function does not accept two columns when put into the %>%
syntax or in the data.table
syntax to split into groups.
Using the mtcars
dataset, I am trying to order by cyl and disp within two dummy groups. I can order the data within groups using order
as suggested in another post (that I can't track down) but I cannot set a method to deal with ties in ranks. I can also use frank
to order by cyl and then disp but I cannot get that function to do within the groups.
data(mtcars)
mtcars<-as.data.table(mtcars)%>%select(cyl, disp, gear)
mtcars$gr1<-rep(c("fast", "slow"), 16)
mtcars$gr2<-rep(c("red", "blue"), each=16)
This is the code that I am trying to use but which fails but I do not understand the hints in the error message:
mtcars%>%group_by(gr1, gr2)%>%
arrange(gr1, gr2, .by.group=TRUE)%>% #by.group to keep within groups : may not be helpful for frank
mutate(ranks=frank(-cyl, -disp, na.last=TRUE, ties.method="min"))#orders by cyl and then by disp
#Error in `mutate()`:
# ℹ In argument: `ranks = frank(-cyl, -disp, na.last = TRUE, ties.method = "min")`.
#ℹ In group 1: `gr1 = "fast"`, `gr2 = "blue"`.
#Caused by error in `frankv()`:
# ! x is a single vector, non-NULL 'cols' doesn't make sense
#Run `rlang::last_error()` to see where the error occurred.
#works but only uses cyl
mtcars[, list(ranks= frank(-cyl, na.last=TRUE, ties.method="min")),
by=list(gr1, gr2)]
#fails
mtcars[, list(ranks= frank(-cyl, -disp, na.last=TRUE, ties.method="min")),
by=list(gr1, gr2)]
Looking at the ?frank
help page, it does not accept multiple vectors as input as you are trying to use it. It either accepts (a) a single vector or (b) a list, data frame, or data.table as the first argument, along with the column names (or a vector of quoted column names or indices).
From ?frank
:
Arguments
x
Avector
, orlist
with all its elements identical in length ordata.frame
ordata.table
.
...
Only forlists
,data.frames
anddata.tables
. The columns to calculate ranks based on. Do not quote column names. If...
is missing, all columns are considered by default. To sort by a column in descending order prefix "-
", e.g.,frank(x, a, -b, c)
.-b
works whenb
is of type character as well.
Note that ...
works "only for for lists
, data.frames
and data.tables
."
You don't want to rank a single vector, which means you need to use option (b) - give frank
the data.table as its first argument, and then specify the columns. To do this by group in data.table
we use .SD
. (I don't think this interface will play nicely with dplyr::group_by
.)
Here's an all data.table
version of my interpretation of your code. You haven't provided expected output, so I can only hope it's what you want.
mtcars[, result := frank(.SD, -cyl, -disp, na.last = TRUE, ties.method = "min"), by = .(gr1, gr2)]
mtcars
# cyl disp gear gr1 gr2 result
# 1: 6 160.0 4 fast red 6
# 2: 6 160.0 4 slow red 7
# 3: 4 108.0 4 fast red 8
# 4: 6 258.0 3 slow red 4
# 5: 8 360.0 3 fast red 2
# 6: 6 225.0 3 slow red 5
# 7: 8 360.0 3 fast red 2
# 8: 4 146.7 4 slow red 8
# 9: 4 140.8 4 fast red 7
# 10: 6 167.6 4 slow red 6
# 11: 6 167.6 4 fast red 5
# 12: 8 275.8 3 slow red 2
# 13: 8 275.8 3 fast red 4
# 14: 8 275.8 3 slow red 2
# 15: 8 472.0 3 fast red 1
# 16: 8 460.0 3 slow red 1
# 17: 8 440.0 3 fast blue 1
# 18: 4 78.7 4 slow blue 7
# 19: 4 75.7 4 fast blue 8
# 20: 4 71.1 4 slow blue 8
# 21: 4 120.1 3 fast blue 7
# 22: 8 318.0 3 slow blue 2
# 23: 8 304.0 3 fast blue 4
# 24: 8 350.0 3 slow blue 1
# 25: 8 400.0 3 fast blue 2
# 26: 4 79.0 4 slow blue 6
# 27: 4 120.3 5 fast blue 6
# 28: 4 95.1 5 slow blue 5
# 29: 8 351.0 5 fast blue 3
# 30: 6 145.0 5 slow blue 3
# 31: 8 301.0 5 fast blue 5
# 32: 4 121.0 4 slow blue 4
# cyl disp gear gr1 gr2 result
Thanks to Jamie's comment, we can construct a list
on the fly of the ranking columns which we can do within a dplyr
pipeline:
## using list() with data.table
mtcars[, result := frank(list(-cyl, -disp), na.last = TRUE, ties.method = "min"), by = .(gr1, gr2)]
## using list() with dplyr
mtcars %>%
group_by(gr1, gr2) %>%
mutate(result = frank(list(-cyl, -disp), na.last=TRUE, ties.method="min"))