I am repeatedly doing data.table commands that apply the same function to a bunch of columns, with a group.by statement, and I'm having trouble simplifying/generalizing the command. For example, "output" below is correct, but I think it should be possible to write a more concise command to achieve the same output. "output_fail" is my attempt to do so:
library(data.table)
df <- data.table(a = c(1, 1, 2, 2),
b = c(2, 4, 6, 6),
c = c(1, 3, 7, 10),
d = c(1, 5, 1, 5)
)
output = df[, .(b = sum(b, na.rm = TRUE),
c = sum(c, na.rm = TRUE),
d = sum(d, na.rm = TRUE)
),
by = a]
cols = c('b', 'c', 'd')
output_fail <- df[,(cols) := lapply(cols, function(x) sum(x, na.rm = TRUE))
, by = a
]
How could I write a line to produce output more cleanly given "cols"?
We could use mget
to get the columns in cols
and apply the sum
function to them:
df[, lapply(mget(cols), sum, na.rm = TRUE), by = a]
a b c d
1: 1 6 4 6
2: 2 12 17 6