I store variable names of interest in a character vector. Usually, I store those vectors in a nested list (e.g. variables$predictors$model1
), as to reduce clutter and better organize them. For this reason, I usually work with sublists and indexing of lists. However, I am having a hard time translating this workflow into data.table
.
Consider the simple task of subsetting the data.table to a subset of columns whose names are in a character vector. As you can see, the commonly suggested manners to subset do not give the intended output. What is more annoying, the desired output requires using get()
(together with a list) which has occasionally undesired behavior.
Is this really the most efficient way available within data.table
for this simple action?
Why do options 1 to 4 return just the string?
library(data.table)
# Create data.table with three variables
dt <- data.table(a = c(1:3, NA), b = 1:4, c = c(NA, 1:3))
# Define column names of interest
column_names_of_interest <- c("b", "c")
# Subset by one of the column names
# Attempted approaches
# 1
dt[, column_names_of_interest[1]]
#[1] "b"
# 2
dt[, column_names_of_interest[[1]]]
# [1] "b"
# 3
dt[, ..column_names_of_interest[1]]
# [1] "b"
# 4
dt[, ..column_names_of_interest[[1]]]
# [1] "b"
# 5
dt[, get(column_names_of_interest[1])]
# [1] 1 2 3 4
# 6
dt[, .(get(column_names_of_interest[1]))]
# V1
# 1: 1
# 2: 2
# 3: 3
# 4: 4
dt <- data.table(a = c(1:3, NA), b = 1:4, c = c(NA, 1:3))
column_names_of_interest <- c("b", "c")
dt[, .SD, .SDcols = column_names_of_interest]
# b c
# 1: 1 NA
# 2: 2 1
# 3: 3 2
# 4: 4 3