Search code examples
rdplyrplyr

How to combine count() and group_by() to count responses with a certain value, grouped by respondent?


I have a set of data where the response to a series of repeated questions is the outcome of interest. Because of this, I'd like to count the number of "I don't know" responses, grouping those counts by respondent ID, and append it as a new column. So basically, I have data that look like this:

ID response
1 Yes
1 I don't know
2 No
2 I don't know

And I want them to look like this:

ID response idkcount
1 Yes 1
1 I don't know 1
2 No 1
2 I don't know 1

This is the code I've most recently written:

df$idkcount <- group_by(as_tibble(df$ID)) %>% count(df$response == "I don't know")

But I seem to get an error message no matter what I try with these two commands. What am I missing?


Solution

  • Using group_by and mutate you could do:

    Note: I slightly altered your example data to a more general case.

    df <- data.frame(
      ID = c(1L, 1L, 1L, 1L, 2L, 2L),
      response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
    )
    
    library(dplyr)
    
    df %>% 
      group_by(ID) %>% 
      mutate(idkcount = sum(response == "I don't know", na.rm = TRUE)) %>% 
      ungroup()
    #> # A tibble: 6 × 3
    #>      ID response     idkcount
    #>   <int> <chr>           <int>
    #> 1     1 Yes                 3
    #> 2     1 I don't know        3
    #> 3     1 I don't know        3
    #> 4     1 I don't know        3
    #> 5     2 No                  1
    #> 6     2 I don't know        1