Search code examples
rdplyr

dplyr: count unique values


I have a data:

df <- data.frame(strain = 1:6, sample = c("a24", "a24", "a24", "a26", "a26", "a27"), region = c(rep("ny", 3), rep("detroit",3)))

I want to count the number of sample per region and get something like:

region sample_count
ny 1
detroit 2

I.e. ny has only one sample "a24", and detroit has two samples "a26" and "a27"


Solution

  • this way:

    library(dplyr)
    
    df |> 
      group_by(region) |> 
      summarise(sample_count = n_distinct(sample))
    

    Output is:

    # A tibble: 2 × 2
      region  sample_count
      <chr>          <int>
    1 detroit            2
    2 ny                 1