Search code examples
rdata.tablelapply

How to reverse an ordering operation on data table in R?


Let DT be an examplary data table. DT2 is constructed from DT by sorting each column independently.

As an example the table DT:

ID Letter
1 B
4 A
3 C
2 D
5 E

becomes the table DT2:

ID Letter
1 A
2 B
3 C
4 D
5 E

In reordering_dt I have saved what orderings I used to make the transformation from DT to DT2.

# The original data table
DT <- data.table(
  ID = c(1, 4, 3, 2, 5),
  Letter = c("B", "A", "C", "D", "E")
)
# The "sorted" data table
DT2 <- DT[, lapply(.SD, function(col) col[order(col)])]

# storing what orderings we used to get DT2 from DT
reordering_dt <- as.data.table(lapply(DT, order))

Using only reordering_dt and DT2, how can I get DT again?

I should be able to make the transformation if I can do something like this:

recovered_DT <- DT2[, lapply(.SD, function(col) col[order(X)])]

with X being the corresponding column in reordering_dt.

However I don't know how to specify X correctly.


Solution

  • Sure, we can reverse it using this:

    DT3 <- Map(function(val, ord) val[order(ord)], DT2, reordering_dt) |>
      as.data.table()
    DT3
    #       ID Letter
    #    <num> <char>
    # 1:     1      B
    # 2:     4      A
    # 3:     3      C
    # 4:     2      D
    # 5:     5      E
    all.equal(DT, DT3)
    # [1] TRUE
    

    If you're working with significantly large datasets, I suggest doing order(.) only once, perhaps

    reordering_dt <- as.data.table(lapply(DT, order))
    DT2b <- Map(function(val, ord) val[ord], DT, reordering_dt) |>
      as.data.table()
    all.equal(DT2, DT2b)
    # [1] TRUE