Search code examples
rdata.tableapplylapply

In R, simplifying long data.table commands (probably combining Data.table's "group by", lapply, and a vector of column names)


I am repeatedly doing data.table commands that apply the same function to a bunch of columns, with a group.by statement, and I'm having trouble simplifying/generalizing the command. For example, "output" below is correct, but I think it should be possible to write a more concise command to achieve the same output. "output_fail" is my attempt to do so:

library(data.table)

df <- data.table(a = c(1, 1, 2, 2),
                 b = c(2, 4, 6, 6),
                 c = c(1, 3, 7, 10),
                 d = c(1, 5, 1, 5)
                 )

output = df[, .(b = sum(b, na.rm = TRUE),
                c = sum(c, na.rm = TRUE), 
                d = sum(d, na.rm = TRUE)
                ),
            by = a]


cols = c('b', 'c', 'd')

output_fail <- df[,(cols) := lapply(cols, function(x) sum(x, na.rm = TRUE))
                           , by = a
]

How could I write a line to produce output more cleanly given "cols"?


Solution

  • We could use mget to get the columns in cols and apply the sum function to them:

    df[, lapply(mget(cols), sum, na.rm = TRUE), by = a]
    
       a  b  c d
    1: 1  6  4 6
    2: 2 12 17 6