Search code examples
rdata.tablecross-join

Create all cross combinations, respecting the column hierarchy


From my data.table DT.in, I want to create all cross join combinations, but not only CJ(_all_my_var_).

I need to perform something like this :

CJ(var1)
CJ(var1, var2)
CJ(var1, var2, var3)
...

Here is my reprex :

library(data.table)
set.seed(999)

DT.in <- data.table(lvl1 = rep('AA', 200),
                 code = sample(c('D44', 'J21'), 200, replace = TRUE),
                 var = sample(c('Z3R', 'TR5', 'JKL', 'FR5', 'TFX'), 200, replace = TRUE),
                 test = sample(c('ONE', 'TWO', 'THREE', 'FOUR', 'FIVE', 'SIX', 'SEVEN'), 200, replace = TRUE))

foo.1 <- DT.in[, .(new = CJ(lvl1, unique = TRUE))]
foo.2 <- DT.in[, .(new = CJ(lvl1, code, unique = TRUE))]
foo.3 <- DT.in[, .(new = CJ(lvl1, code, var, unique = TRUE))]
foo.4 <- DT.in[, .(new = CJ(lvl1, code, var, test, unique = TRUE))]

DT.out <- rbindlist(list(foo.1, foo.2, foo.3, foo.4), fill = TRUE)

My DT.out is what I want, but of course, I need to automate this, the name and the number of variable are never the same. And I still can't do it ...

Any help to do this will be very appreciated.

(Maybe there is a whole other way to go from DT.in to DT.out ...?)


Solution

  • We can use do.call with CJ

    1. Loop over the sequence of columns of 'DT.in' with lapply
    2. Get the sequence, to select the columns based on the numeric index
    3. Apply the CJ in do.call
    4. Bind the output datasets in the list to a single one with rbindlist
    library(data.table)
    DT.out2 <- rbindlist(lapply(seq_along(DT.in), function(i) 
          do.call(CJ, c(DT.in[, seq(i), with = FALSE], unique = TRUE))), fill = TRUE)
    setnames(DT.out2, paste0("new.", names(DT.out2)))
    

    -checking with OP's output

    identical(DT.out, DT.out2)
    [1] TRUE