Search code examples
rcountsummarize

Count occurrence of value in repeated measure


Hi I have the dataset below:

ID <- c(1,1,1,2,2,3,3,3,4,4,4)
diagnosis <- c("A","A","B","C","C","B","A","A","C","C","B")
df <- data.frame(ID,diagnosis)

ID diagnosis
1  A
1  A
1  B 
2  C
2  C
3  B
3  A
3  A
4  C 
4  C
4  B

I would like to count how many people had each type of diagnosis. Some people have the same diagnosis multiple times which I would like to have them count once.

ie. Only two people would have diagnosis "A". (ID 1 and ID 3)

ie. Only two people would have diagnosis "C". (ID 2 and ID 4)

ie. Only three people would have diagnosis "B". (ID 1, ID 2 and ID 4)

I'm wondering if there's a way of summarizing the above into a table.

I would appreciate all the help there is! Thanks!!!


Solution

  • You could group_by on diagnosis and summarise with n_distinct to count the ID's per group like this:

    library(dplyr)
    df %>%
      group_by(diagnosis) %>%
      summarise(n = n_distinct(ID))
    #> # A tibble: 3 × 2
    #>   diagnosis     n
    #>   <chr>     <int>
    #> 1 A             2
    #> 2 B             3
    #> 3 C             2
    

    Created on 2023-03-31 with reprex v2.0.2