Search code examples
rdata.tabletranspose

Transpose a data.table (columns names -> first column of output)


What is the most efficient way to traspose

> dt <- data.table( x = c(1, 1, 3, 1, 3, 1, 1), y = c(1, 2, 1, 2, 2, 1, 1) )
> dt
   x y
1: 1 1
2: 1 2
3: 3 1
4: 1 2
5: 3 2
6: 1 1
7: 1 1

into:

> output
    cn   v1   v2   v3   v4   v5   v6   v7
1:   x    1    1    3    1    3    1    1
2:   y    1    2    1    2    2    1    1

dcast.data.table is supposed to be efficient, but I can't figure out how exactly it has to be done


Solution

  • How about data.table::transpose:

    data.table(cn = names(dt), transpose(dt))
    #   cn V1 V2 V3 V4 V5 V6 V7
    #1:  x  1  1  3  1  3  1  1
    #2:  y  1  2  1  2  2  1  1
    

    If you are really concerned about efficiency, this may be better:

    tdt <- transpose(dt)[, cn :=  names(dt)]
    setcolorder(tdt, c(ncol(tdt), 1:(ncol(tdt) - 1)))
    tdt
    #   cn V1 V2 V3 V4 V5 V6 V7
    #1:  x  1  1  3  1  3  1  1
    #2:  y  1  2  1  2  2  1  1 
    

    transpose seems to be a little faster than t (which calls do_transpose), but not by a large margin. I would guess that both of these implementations are roughly near the upper bound of efficiency for non in-place transposition algorithms.

    Dt <- data.table( 
        x = rep(c(1, 1, 3, 1, 3, 1, 1), 10e2), 
        y = rep(c(1, 2, 1, 2, 2, 1, 1), 10e2))
    
    all.equal(data.table(t(Dt)), data.table(transpose(Dt)))
    #[1] TRUE
    
    microbenchmark::microbenchmark(
        "base::t" = data.table(t(Dt)),
        "data.table::transpose" = data.table(transpose(Dt))
    )
    #Unit: milliseconds
    #                 expr      min       lq     mean   median       uq      max neval
    #base::t               415.4200 434.5308 481.4373 458.1619 507.9556 747.2022   100
    #data.table::transpose 409.5624 425.8409 474.9709 444.5194 510.3750 685.0543   100