Hi I have the dataset below:
ID <- c(1,1,1,2,2,3,3,3,4,4,4)
diagnosis <- c("A","A","B","C","C","B","A","A","C","C","B")
df <- data.frame(ID,diagnosis)
ID diagnosis
1 A
1 A
1 B
2 C
2 C
3 B
3 A
3 A
4 C
4 C
4 B
I would like to count how many people had each type of diagnosis. Some people have the same diagnosis multiple times which I would like to have them count once.
ie. Only two people would have diagnosis "A". (ID 1 and ID 3)
ie. Only two people would have diagnosis "C". (ID 2 and ID 4)
ie. Only three people would have diagnosis "B". (ID 1, ID 2 and ID 4)
I'm wondering if there's a way of summarizing the above into a table.
I would appreciate all the help there is! Thanks!!!
You could group_by
on diagnosis and summarise
with n_distinct
to count the ID's per group like this:
library(dplyr)
df %>%
group_by(diagnosis) %>%
summarise(n = n_distinct(ID))
#> # A tibble: 3 × 2
#> diagnosis n
#> <chr> <int>
#> 1 A 2
#> 2 B 3
#> 3 C 2
Created on 2023-03-31 with reprex v2.0.2