Count occurrence of value in repeated measure

Hi I have the dataset below:

ID <- c(1,1,1,2,2,3,3,3,4,4,4)
diagnosis <- c("A","A","B","C","C","B","A","A","C","C","B")
df <- data.frame(ID,diagnosis)

ID diagnosis
1  A
1  A
1  B 
2  C
2  C
3  B
3  A
3  A
4  C 
4  C
4  B

I would like to count how many people had each type of diagnosis. Some people have the same diagnosis multiple times which I would like to have them count once.

ie. Only two people would have diagnosis "A". (ID 1 and ID 3)

ie. Only two people would have diagnosis "C". (ID 2 and ID 4)

ie. Only three people would have diagnosis "B". (ID 1, ID 2 and ID 4)

I'm wondering if there's a way of summarizing the above into a table.

I would appreciate all the help there is! Thanks!!!

Solution

You could group_by on diagnosis and summarise with n_distinct to count the ID's per group like this:

library(dplyr)
df %>%
  group_by(diagnosis) %>%
  summarise(n = n_distinct(ID))
#> # A tibble: 3 × 2
#>   diagnosis     n
#>   <chr>     <int>
#> 1 A             2
#> 2 B             3
#> 3 C             2

^{Created on 2023-03-31 with reprex v2.0.2}