Search code examples
rdplyrtidyr

Count identical values in a string, than group, and count the results using tidyverse and dplyr


Let´s say I have this dataset with 3 variables. One of them student is a string with the "name" of students.

df = data.frame(quest = c(2,4,6), test = rep(c("math","science","arts"),3), student = c("0risk, student1, student3","student1","0risk, student2, student3")) %>% arrange(quest, test)

string

I would like to count students grouped by quest and test and return something like that

Any advice would be helpful. Thank you


Solution

  • Using

    • tidyr::separate_rows you could split the student column into rows,
    • get the counts using dplyr::count,
    • tidyr::complete your dataset to get all combos of the categorical variables and
    • finally use tidyr::pivot_wider to convert to wide format:
    library(dplyr)
    library(tidyr)
    
    df %>% 
      tidyr::separate_rows(student) %>% 
      count(quest, test, student, .drop = FALSE) %>% 
      tidyr::complete(quest, test, student, fill = list(n = 0)) %>% 
      tidyr::pivot_wider(names_from = test, values_from = n, values_fill = 0)
    #> # A tibble: 12 × 5
    #>    quest student   arts  math science
    #>    <dbl> <chr>    <dbl> <dbl>   <dbl>
    #>  1     2 0risk        0     3       0
    #>  2     2 student1     0     3       0
    #>  3     2 student2     0     0       0
    #>  4     2 student3     0     3       0
    #>  5     4 0risk        0     0       0
    #>  6     4 student1     0     0       3
    #>  7     4 student2     0     0       0
    #>  8     4 student3     0     0       0
    #>  9     6 0risk        3     0       0
    #> 10     6 student1     0     0       0
    #> 11     6 student2     3     0       0
    #> 12     6 student3     3     0       0