Search code examples
rdplyrsummary

Is there a way to group data based on a column that separates values with commas in R?


Say there is dataframe A:

   A  B
1  1  gr1, gr2
2  3  class1, gr1
3  4  gr2

Is there a way to summarize data for each comma seperated letter in column B? For example to get the mean of them like so:

   group   mean
1  gr1     2
2  gr2     2.5
3  class1  3

Solution

  • That can easily be done with the function separate_rows() from tidyr:

    library(tidyverse)
    
    dat <-
      tibble(A = c(1, 3, 4),
             B = c("gr1, gr2", "class1, gr1", "gr2"))
    
    dat %>%
      separate_rows(B, sep = ", ") %>% 
      group_by(B) %>% 
      summarize(mean = mean(A))
    
    
    # A tibble: 3 x 2
      B       mean
      <chr>  <dbl>
    1 class1   3  
    2 gr1      2  
    3 gr2      2.5