I'm trying to create a fixed number of columns (in this case 4 columns) with the rep function. In column B is mentioned how often the number 1 should be repeated in those columns. The remaining columns should be filled with 0.
But I get the error message
"Error in rep(1, B) : invalid 'times' argument"
and don´t know how to fix it
test <- data.table(A = c("XYZ", "ZYX", "WER"),
B = c(1, 3, 2))
cols <- LETTERS[3:6]
test[, (cols) := c(rep(1, B), rep(0, length(cols) - B))]
#result should be
result <- data.table(A = c("XYZ", "ZYX", "WER"),
B = c(1, 3, 2),
C = c(1, 1, 1),
D = c(0, 1, 1),
E = c(0, 1, 0),
F = c(0, 0, 0))
This uses a bit of non-'data.table' logic, but should be pretty quick still:
test[, (cols) := {
D <- diag(length(cols))
D[lower.tri(D)] <- 1
data.table(D[B,])
}]
# A B C D E F
#1: XYZ 1 1 0 0 0
#2: ZYX 3 1 1 1 0
#3: WER 2 1 1 0 0
It works because it creates a matrix with a filled diagonal and lower-triangle, and then uses test$B
to subset the rows of this matrix.
Alternatively, you could loop over a sequence of the length of the columns to assign, and check if the value is equal or less:
test[, (cols) := lapply(1:length(cols), function(x) as.numeric(x <= B))]
Some comparative timings adding 24 columns and 3M rows:
cols <- LETTERS[-(1:2)]
test <- test[rep(1:3,1e6),]
system.time(test[, (cols) := {
D <- diag(length(cols))
D[lower.tri(D)] <- 1
data.table(D[B,])
}])
# user system elapsed
# 0.937 0.651 1.591
Beaten by my second effort:
system.time(
test[, (cols) := lapply(1:length(cols), function(x) as.numeric(x <= B))]
)
# user system elapsed
# 0.313 0.132 0.446