I have a data frame similar to this:
data <- data.frame(
Location = rep(letters[1:10], each = 20),
ID = rep(1:40, each = 5)
)
I want to return a table that contains each unique Location
in one column and a count of the number of unique ID
s in each Location
in another column, so it will look like this:
Location Count
a 4
b 4
... ...
Note: in my actual data set there are different numbers of ID
s in each Location
, and there are other variables in other columns.
What is the best way to do this?
We can use n_distinct
on the 'ID' column after grouping by 'Location'. In the example, it is all 4
library(dplyr)
data %>%
group_by(Location) %>%
summarise(Count = n_distinct(ID))
If we need to add a new column, use mutate
instead of summarise
With data.table
, this can be done with uniqueN
library(data.table)
setDT(data)[, .(Count = uniqueN(ID)), Location]