Search code examples
rnesteddata.tableunnest

Any more concise `data.table` way to unnest a nested column in data.table?


Assuming we have a data.table with a nested column val

dt <- data.table(
  grp = c(1, 2, 1, 3, 4),
  val = list("a", c("b", "c"), c("d", "e", "f"), "g", c("h", "i"))
)

which shows as

> dt
   grp   val
1:   1     a
2:   2   b,c
3:   1 d,e,f
4:   3     g
5:   4   h,i

Question

I would like to unnest the val column, where a possible option is using tidyr::unnest, i.e.,

> dt %>%
+   unnest(val)
# A tibble: 9 × 2
    grp val
  <dbl> <chr>
1     1 a
2     2 b
3     2 c
4     1 d
5     1 e
6     1 f
7     3 g
8     4 h
9     4 i

I am wondering if we can implement it by using the data.table only.

Watch out the order of values in the column grp, I would like to preserve the order like 1,2,1,3,4 rather than 1,1,2,3,4.


My data.table Attempt

My attempt is as below

> dt[, id := .I][, lapply(.SD, unlist), id][, id := NULL][]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

or

> dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

but I guess there might be some more concise and elegant way to do this, e.g., without creating an auxiliary column id or using rep + lengths.

Any idea? Much appreciated!


Solution

  • One more option:

    dt[, .(grp, val = unlist(val)), by = .I][, !"I"]
    #      grp    val
    #    <num> <char>
    # 1:     1      a
    # 2:     2      b
    # 3:     2      c
    # 4:     1      d
    # 5:     1      e
    # 6:     1      f
    # 7:     3      g
    # 8:     4      h
    # 9:     4      i
    

    PS. If you are using data.table 1.4.2 or older by = .I won't work. Instead you can use:

    dt[, .(grp, val = unlist(val)), by = 1:nrow(dt)][, !"nrow"]
    

    PS2. I think your dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))] is neater.