Search code examples
rjoindata.tablecross-join

Add missing rows to data.table


Let's say that my options of values are 1 to 9. I have the following data.table:

> group = c(1,1,1,2,2,2,2,3,3,3,3,4,4)
> value = c(2,3,4,5,6,7,8,2,4,6,7,4,9)
> a=data.table(group=group,value=value)
> a
    group value
 1:     1     2
 2:     1     3
 3:     1     4
 4:     2     5
 5:     2     6
 6:     2     7
 7:     2     8
 8:     3     2
 9:     3     4
10:     3     6
11:     3     7
12:     4     4
13:     4     9

Now for every group, I want to add the missing rows. I.e. for group 1, I miss values 1 and 5-9. However, if I do it with CJ:

> b=a[CJ(group=group,value=value,unique=TRUE),on=.(group,value)]
> b
    group value
 1:     1     2
 2:     1     3
 3:     1     4
 4:     1     5
 5:     1     6
 6:     1     7
 7:     1     8
 8:     1     9
 9:     2     2
10:     2     3
11:     2     4
12:     2     5
13:     2     6
14:     2     7
15:     2     8
16:     2     9
17:     3     2
18:     3     3
19:     3     4
20:     3     5
21:     3     6
22:     3     7
23:     3     8
24:     3     9
25:     4     2
26:     4     3
27:     4     4
28:     4     5
29:     4     6
30:     4     7
31:     4     8
32:     4     9
    group value

I miss the value 1. As it is not in this data table at all. But I still consider it as one of my options. How can I do this?


Solution

  • As indicated in @Roland's comment, instead of value = value in CJ(), use:

    value = seq_len(max(value))
    

    Or specify the range you would like in your value column.

    Thus, you simply need to modify your attempt from being:

    b = a[CJ(group = group, value = value, unique = TRUE), on = .(group,value)]
    

    to being:

    b = a[CJ(group = group, value = seq_len(max(value)), unique = TRUE),
          on = .(group,value)]