Search code examples
rdataframedata.tableunique

For R data.table, how to use uniqueN() in order count unique/distinct values in multiple columns?


Let's say we have the following R data.table (though a dataframe is fine in principle as well):

library(data.table)
new_dt = as.data.table(readcsv(...))
head(new_dt)
    name order_no  overlimit
1    Amy       32  TRUE
2   Jack       14  TRUE
3   Jack       16  TRUE
4   Dave       11  FALSE
5    Amy       22  TRUE
6   Jack       16  TRUE
7    Tom       19  FALSE
8  Larry       22  FALSE
9    Tom       89  FALSE
10  Dave       81  TRUE
11  Jack       67  TRUE
12   Tom       60  TRUE
13   Amy       23  FALSE
14  Jack       16  TRUE

For this task, the values of order_no is irrelevant---I would like to count the unique number of rows for name and overlimit:

name    overlimit  distinct_groups
Amy     TRUE       2
Amy     FALSE      1
Jack    TRUE       4
Jack    FALSE      0
Dave    TRUE       1
Dave    FALSE      1
...

Does one simply include more columns in the by argument?


Solution

  • To answer your question, yes, you can just add both columns to the by argument:

    dt[, .(distinct_groups = uniqueN(order_no)), by = c("Name", "Overlimit")]