Search code examples
rdata.tableassignchaining

data.table: Different result whether assignment is chained or within same square bracket as subsetting


In the working example below, we see that the operations dt[i, j] and dt[i][,j] are not equivalent in their output. I would have assumed that chaining did not make a difference, but clearly it does. What is happening under the hood and why is this intended behavior?

library(data.table)
library(lubridate)

  # Order A
dates <- as.Date(c("2022-08-18", "2022-10-20"))
db <- data.frame(id = 1:2, d = dates)
dt <- data.table(db)

dt[, month := months(dates)]
print(dt) # month is character

dt[id == 1][, month := lubridate::month(d)]
print(dt) # month remains character

dt[id == 1, month := lubridate::month(d)]
print(dt) # month is numeric



  # Order B
dates <- as.Date(c("2022-08-18", "2022-10-20"))
db <- data.frame(id = 1:2, d = dates)
dt <- data.table(db)

dt[, month := months(dates)]
print(dt) # month is character

dt[id == 1, month := lubridate::month(d)]
print(dt) # month is numeric

dt[id == 1][, month := lubridate::month(d)]
print(dt) # month remains numeric

Solution

  • := modifies the input object by reference, see reference semantics.

    You can observe this using tracemem():

    tracemem(dt)
    #[1] "<000001CEC9999820>"
    tracemem(dt[, month := months(dates)])
    #[1] "<000001CEC9999820>"
    # Same address
    

    However, in dt[id == 1][, month := lubridate::month(d)], the input object is dt[id == 1], a subset of dt that has another memory address :

    tracemem(dt[id == 1])
    [1] "<000001CEC939EBB0>"
    # different than dt's address ; changes to this object by reference don't modify dt