Search code examples
rdata.tablereserved-words

Reserved words for R data.table column names?


I'm finding some sharp edges regarding specific column names in data.table. How can I avoid cutting myself on them? Assume I have a data.table with two columns, 'type' and 'value'.

numRows = 100
numTypes = 10
dt = data.table(type=sample(numTypes, numRows, replace=T),
                value=rnorm(numRows))

If I want to see quickly calculate the mean for all rows with type==3, this works great:

dt[type==3, mean(value)]
# [1] 0.08086124

But what if "someone who is not me" came along and decided that 'type' is a poor name for the column, and it is really should be a 'class'?

setnames(dt, "type", "class")

Now when I try the equivalent operation I get scary error messages:

dt[class==3, mean(value)]
# Error in setattr(attr(x, "index"), paste(cols, collapse = "__"), o) : 
#  attempt to set invalid 'class' attribute

I this expected behavior (for 1.9.4 on OSX)? I presume it happens because 'class' is a function name in R, and something internal to data.table is interpreting it as such. Wrapping the i clause in parentheses seems to solve the problem:

dt[(class==3), mean(value)]
# [1] 0.08086124

But maybe there are cases where this workaround fails too?

Is there a list of column names that are expected to fail in this case?

Can user defined functions or loaded libraries cause the same error?

Is there in general a safer way to do this that I should be using?


Solution

  • This seems to be already fixed. Update your data.table package.

    library(data.table)
    set.seed(1)
    numRows = 100
    numTypes = 10
    dt = data.table(type=sample(numTypes, numRows, replace=T),
                    value=rnorm(numRows))
    setnames(dt,"type","class")
    dt[class==3, mean(value)]
    # [1] -0.2300146