I have a datatable dt
with three columns:
dt <- data.table(A=c("eggs","flour","butter","eggs","eggs"), B = c(8, 9, 7, 5, 1), C = c(5, 1, 0, 2, 3))
dt
A B C
1: eggs 8 5
2: flour 9 1
3: butter 7 0
4: eggs 5 2
5: eggs 1 3
I want to multiply column the values of B and C with 10 whenever A == eggs
, which can be achieved as follows:
dt[A=="eggs", c("B","C") := lapply(.SD, function(x){x*10}), .SDcols = c("B","C")]
dt
A B C
1: eggs 80 50
2: flour 9 1
3: butter 7 0
4: eggs 50 20
5: eggs 10 30
I am using lapply()
because in my actual data.table, I have many more columns than just B and C whose values need multiplication. For that reason, I would like to make my code look less cluttered by defining columns_to_multiply <- c("B","C")
outside of the data.table syntax. In the documentation it is mentioned that this vector of column names can be called from within the syntax using a ..
prefix, e.g. something like this:
columns_to_multiply <- c("B","C")
dt[A=="eggs", ..columns_to_multiply := lapply(.SD, function(x){x*10}), .SDcols = ..columns_to_multiply]
which gives me the following error:
Error in eval(colsub, parent.frame(), parent.frame()) :
object '..columns_to_multiply' not found
If I only use the ..
prefix in j as such:
dt[A=="eggs", ..columns_to_multiply := lapply(.SD, function(x){x*10}), .SDcols = columns_to_multiply]
I get the following error message:
Error in `[.data.table`(dt, A=="eggs", `:=`(..columns_to_multiply, lapply(.SD, :
Supplied 2 items to be assigned to 3 items of column '..columns_to_multiply'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Is there a clean and simple way to execute the operation that I want by properly referencing to my columns_to_multiply
vector within the data.table syntax?
It is simpler than what you tried -
library(data.table)
columns_to_multiply <- c("B","C")
dt[A=="eggs", (columns_to_multiply) := lapply(.SD, function(x){x*10}),
.SDcols = columns_to_multiply]
dt
# A B C
#1: eggs 80 50
#2: flour 9 1
#3: butter 7 0
#4: eggs 50 20
#5: eggs 10 30
And since multiplication can directly be applied to dataframe/data.table you can further simplify it without using lapply
-
dt[A=="eggs", (columns_to_multiply) := .SD * 10, .SDcols = columns_to_multiply]