Search code examples
rcountsummarize

How to return the number of unique observations in each group of a data frame


I have a data frame similar to this:

data <- data.frame(
  Location = rep(letters[1:10], each = 20),
  ID = rep(1:40, each = 5)
)

I want to return a table that contains each unique Location in one column and a count of the number of unique IDs in each Location in another column, so it will look like this:

Location   Count
   a         4
   b         4
   ...      ...

Note: in my actual data set there are different numbers of IDs in each Location, and there are other variables in other columns.

What is the best way to do this?


Solution

  • We can use n_distinct on the 'ID' column after grouping by 'Location'. In the example, it is all 4

    library(dplyr)
    data %>% 
        group_by(Location) %>%
        summarise(Count = n_distinct(ID))
    

    If we need to add a new column, use mutate instead of summarise


    With data.table, this can be done with uniqueN

    library(data.table)
    setDT(data)[, .(Count = uniqueN(ID)), Location]