I am trying to do a min
/max
aggregate on a dynamically chosen column in a data.table
. It works perfectly for numeric
columns but I cannot get it to work on Date
columns unless I create a temporary data.table
.
It works when I use the name:
dt <- data.table(Index=1:31, Date = seq(as.Date('2015-01-01'), as.Date('2015-01-31'), by='days'))
dt[, .(minValue = min(Date), maxValue = max(Date))]
# minValue maxValue
# 1: 2015-01-01 2015-01-31
It does not work when I use with=FALSE
:
colName = 'Date'
dt[, .(minValue = min(colName), maxValue = max(colName)), with=F]
# Error in `[.data.table`(dt, , .(minValue = min(colName), maxValue = max(colName)), :
# could not find function "."
I can use .SDcols
on a numeric column:
colName = 'Index'
dt[, .(minValue = min(.SD), maxValue = max(.SD)), .SDcols=colName]
# minValue maxValue
# 1: 1 31
But I get an error when I do the same thing for a Date column:
colName = 'Date'
dt[, .(minValue = min(.SD), maxValue = max(.SD)), .SDcols=colName]
# Error in FUN(X[[i]], ...) :
# only defined on a data frame with all numeric variables
If I use lapply(.SD, min)
or sapply()
then the dates are changed to numbers.
The following works and does not seem to waste memory and is fast. Is there anything better?
a <- dt[, colName, with=F]
setnames(a, 'a')
a[, .(minValue = min(a), maxValue = max(a))]
On your first attempt:
dt[, .(minValue = min(colName), maxValue = max(colName)), with=F]
# Error in `[.data.table`(dt, , .(minValue = min(colName), maxValue = max(colName)), :
# could not find function "."
You should simply read the Introduction to data.table vignette to understand what with=
means. It's easier if you're aware of with()
function from base R.
On the second one:
dt[, .(minValue = min(.SD), maxValue = max(.SD)), .SDcols=colName]
# Error in FUN(X[[i]], ...) :
# only defined on a data frame with all numeric variables
This seems like an issue with min()
and max()
on a data.frame/data.table with column with attributes. Here's a MRE.
df = data.frame(x=as.Date("2015-01-01"))
min(df)
# Error in FUN(X[[i]], ...) :
# only defined on a data frame with all numeric variables
To answer your question, you can use get()
:
dt[, .(min = min(get(colName)), max = max(get(colName)))]
Or as @Frank suggested, [[
operator to subset the column:
dt[, .(min = min(.SD[[colName]]), max = max(.SD[[colName]]))]
There's not yet a nicer way of applying .SD
to multiple functions (because base R doesn't seem to have one AFAICT, and data.table tries to use base R functions as much as possible). There's a FR #1063 to address this issue. If/when that gets implemented, then one could do, for example:
# NOTE: not yet implemented, FR #1063
dt[, colwise(.SD, min, max), .SDcols = colName]