Search code examples
rcountdata.tableunique

R data.table count unique values within multiple columns by group


> require(data.table)
> have <- data.table(ID = c(1,1,1,2,2)
+                      , colA = c("A","B","A","A","A")
+                      , colB = c("C","A","B","B","C"))
> have
   ID colA colB
1:  1    A    C
2:  1    B    A
3:  1    A    B
4:  2    A    B
5:  2    A    C
> want <- data.table(ID = c(1,2), UnN = c(3,3))
> want
   ID UnN
1:  1   3
2:  2   3

I have a datatable 'have', and I would like to count unique values by group 'ID' within multiple columns 'colA' and 'colB'. How to achieve it?

Not sure why the following wouldn't work:

have[, UnN = uniqueN(c("colA","colB")), by = C("ID")]

Solution

  • Remove the quotes around the column names so that the columns can be evaluated as vectors before being passed to uniqueN function, otherwise they are evaluated as literal character vectors:

    have[, .(UnN = uniqueN(c(colA, colB))), ID]
    #   ID UnN
    #1:  1   3
    #2:  2   3