Search code examples
rdata.tablerep

Add multiple columns with rep in data.table


I'm trying to create a fixed number of columns (in this case 4 columns) with the rep function. In column B is mentioned how often the number 1 should be repeated in those columns. The remaining columns should be filled with 0.

But I get the error message

"Error in rep(1, B) : invalid 'times' argument"

and don´t know how to fix it

test <- data.table(A = c("XYZ", "ZYX", "WER"),
                   B = c(1, 3, 2))
cols <- LETTERS[3:6] 
test[, (cols) := c(rep(1, B), rep(0, length(cols) - B))]  

#result should be 
result <- data.table(A = c("XYZ", "ZYX", "WER"),
                     B = c(1, 3, 2),
                     C = c(1, 1, 1),
                     D = c(0, 1, 1),
                     E = c(0, 1, 0),
                     F = c(0, 0, 0))

Solution

  • This uses a bit of non-'data.table' logic, but should be pretty quick still:

    test[, (cols) := {
      D <- diag(length(cols))
      D[lower.tri(D)] <- 1
      data.table(D[B,])
    }]
    
    #     A B C D E F
    #1: XYZ 1 1 0 0 0
    #2: ZYX 3 1 1 1 0
    #3: WER 2 1 1 0 0
    

    It works because it creates a matrix with a filled diagonal and lower-triangle, and then uses test$B to subset the rows of this matrix.

    Alternatively, you could loop over a sequence of the length of the columns to assign, and check if the value is equal or less:

    test[, (cols) := lapply(1:length(cols), function(x) as.numeric(x <= B))]
    

    Some comparative timings adding 24 columns and 3M rows:

    cols <- LETTERS[-(1:2)]
    test <- test[rep(1:3,1e6),]
    
    system.time(test[, (cols) := {
      D <- diag(length(cols))
      D[lower.tri(D)] <- 1
      data.table(D[B,])
    }])
    
    #   user  system elapsed 
    #  0.937   0.651   1.591 
    

    Beaten by my second effort:

    system.time(
      test[, (cols) := lapply(1:length(cols), function(x) as.numeric(x <= B))]
    )
    #   user  system elapsed 
    #  0.313   0.132   0.446