Search code examples
rindexingdata.tablesubset

Why selecting columns with an element of a character vector requires get() in data.table?


Context

I store variable names of interest in a character vector. Usually, I store those vectors in a nested list (e.g. variables$predictors$model1), as to reduce clutter and better organize them. For this reason, I usually work with sublists and indexing of lists. However, I am having a hard time translating this workflow into data.table.

Problem

Consider the simple task of subsetting the data.table to a subset of columns whose names are in a character vector. As you can see, the commonly suggested manners to subset do not give the intended output. What is more annoying, the desired output requires using get() (together with a list) which has occasionally undesired behavior.

Is this really the most efficient way available within data.table for this simple action?

Why do options 1 to 4 return just the string?

library(data.table)

# Create data.table with three variables
dt                       <- data.table(a = c(1:3, NA), b = 1:4, c = c(NA, 1:3))

# Define column names of interest
column_names_of_interest <- c("b", "c")

# Subset by one of the column names
# Attempted approaches

# 1
dt[, column_names_of_interest[1]]
#[1] "b"

# 2
dt[, column_names_of_interest[[1]]]
# [1] "b"

# 3
dt[, ..column_names_of_interest[1]]
# [1] "b"

# 4
dt[, ..column_names_of_interest[[1]]]
# [1] "b"

# 5
dt[, get(column_names_of_interest[1])]
# [1] 1 2 3 4

# 6
dt[, .(get(column_names_of_interest[1]))]
#     V1
# 1:  1
# 2:  2
# 3:  3
# 4:  4

Solution

  • dt <- data.table(a = c(1:3, NA), b = 1:4, c = c(NA, 1:3))
    
    column_names_of_interest <- c("b", "c")
    
    dt[, .SD, .SDcols = column_names_of_interest]
    
    #    b  c
    # 1: 1 NA
    # 2: 2  1
    # 3: 3  2
    # 4: 4  3