Search code examples
rdata.table

How to create by reference a column that lists all the ocurrences for a given aggregation


Given the following data.table:

library(data.table)
dt <- data.table(  
  a=c(1,3,1,3,5:8)
  , b=c(2,4,2,4,9:12)
  , signal = c("up", "up", "down", "down")
)

I need to create a column that lists all the signal instances for a given pair of a and b. The following code works:

dt_out <- dt[
  , .(temp = signal |> list(), signal)
  , .(a,b)
] 

Which correctly outputs:

> dt_out
       a     b    temp signal
   <num> <num>  <list> <char>
1:     1     2 up,down     up
2:     1     2 up,down   down
3:     3     4 up,down     up
4:     3     4 up,down   down
5:     5     9      up     up
6:     6    10      up     up
7:     7    11    down   down
8:     8    12    down   down

However, when trying to create temp directly within dt by reference it does not work as expected:

dt[
   , temp := signal |> list()
   , .(a,b)]

Which output does not show temp as a list:

> dt
       a     b signal   temp
   <num> <num> <char> <char>
1:     1     2     up     up
2:     3     4     up     up
3:     1     2   down   down
4:     3     4   down   down
5:     5     9     up     up
6:     6    10     up     up
7:     7    11   down   down
8:     8    12   down   down

How do I fix to create temp by reference in dt?


Solution

  • You've got to double up the list:

    dt[, temp := .(.(signal)), by=.(a,b)]
    ## or
    dt[, temp := list(list(signal)), by=.(a,b)]
    dt
    #       a     b signal    temp
    #   <num> <num> <char>  <list>
    #1:     1     2     up up,down
    #2:     3     4     up up,down
    #3:     1     2   down up,down
    #4:     3     4   down up,down
    #5:     5     9     up      up
    #6:     6    10     up      up
    #7:     7    11   down    down
    #8:     8    12   down    down