Search code examples
rdatatabledata.tablelapply

Change multiple columns in data.table using lapply() that are referenced outside of data.table syntax


I have a datatable dt with three columns:

dt <- data.table(A=c("eggs","flour","butter","eggs","eggs"), B = c(8, 9, 7, 5, 1), C = c(5, 1, 0, 2, 3))
dt
        A B C
1:   eggs 8 5
2:  flour 9 1
3: butter 7 0
4:   eggs 5 2
5:   eggs 1 3

I want to multiply column the values of B and C with 10 whenever A == eggs, which can be achieved as follows:

dt[A=="eggs", c("B","C") := lapply(.SD, function(x){x*10}), .SDcols = c("B","C")]
dt
        A  B  C
1:   eggs 80 50
2:  flour  9  1
3: butter  7  0
4:   eggs 50 20
5:   eggs 10 30

I am using lapply() because in my actual data.table, I have many more columns than just B and C whose values need multiplication. For that reason, I would like to make my code look less cluttered by defining columns_to_multiply <- c("B","C") outside of the data.table syntax. In the documentation it is mentioned that this vector of column names can be called from within the syntax using a .. prefix, e.g. something like this:

columns_to_multiply <- c("B","C")
dt[A=="eggs", ..columns_to_multiply := lapply(.SD, function(x){x*10}), .SDcols = ..columns_to_multiply]

which gives me the following error:

Error in eval(colsub, parent.frame(), parent.frame()) : 
object '..columns_to_multiply' not found

If I only use the .. prefix in j as such:

dt[A=="eggs", ..columns_to_multiply := lapply(.SD, function(x){x*10}), .SDcols = columns_to_multiply]

I get the following error message:

Error in `[.data.table`(dt, A=="eggs", `:=`(..columns_to_multiply, lapply(.SD,  : 
Supplied 2 items to be assigned to 3 items of column '..columns_to_multiply'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Is there a clean and simple way to execute the operation that I want by properly referencing to my columns_to_multiply vector within the data.table syntax?


Solution

  • It is simpler than what you tried -

    library(data.table)
    columns_to_multiply <- c("B","C")
    
    dt[A=="eggs", (columns_to_multiply) := lapply(.SD, function(x){x*10}), 
                  .SDcols = columns_to_multiply]
    
    dt
    #        A  B  C
    #1:   eggs 80 50
    #2:  flour  9  1
    #3: butter  7  0
    #4:   eggs 50 20
    #5:   eggs 10 30
    

    And since multiplication can directly be applied to dataframe/data.table you can further simplify it without using lapply -

    dt[A=="eggs", (columns_to_multiply) := .SD * 10, .SDcols = columns_to_multiply]