Search code examples
rdplyrspread

Spreading a dataframe with two grouping columns


I have a data set of teachers as follows:

df <- data.frame(
  teacher = c("A", "A", "A", "A", "B", "B", "C", 'C'),
  seg = c("1", '1', "2", "2", "1", "2", "1", "2"),
  claim = c(
    "beth",
    'john',
    'john',
    'beth',
    'summer',
    'summer',
    "hannah",
    "hannah"
  )
)

I would ideally like to spread my dataset like this:

Desired output.

enter image description here

Any ideas for how I can use either spread or pivot_wide to achieve this? The issue is that there are two grouping variables here (teacher and segment). Some teachers may have multiple of the same segment, but some teachers don't.


Solution

  • One option would be to create a sequence column grouped by 'teacher', 'seg', and then use pivot_wider

    library(dplyr)
    library(tidyr)
    library(stringr)
    df %>% 
      group_by(teacher, seg) %>%
      mutate(segN = c("", "double")[row_number()]) %>%
      ungroup %>%
      mutate(seg = str_c("seg", seg, segN)) %>%
      select(-segN) %>%
      pivot_wider(names_from = seg, values_from = claim)
    # A tibble: 3 x 5
    #  teacher seg1   seg1double seg2   seg2double
    #   <fct>   <fct>  <fct>      <fct>  <fct>     
    #1 A       beth   john       john   beth      
    #2 B       summer <NA>       summer <NA>      
    #3 C       hannah <NA>       hannah <NA>    
    

    It can be simplified with rowid from data.table

    library(data.table)
    df %>% 
      mutate(seg = str_c('seg', c('', '_double')[rowid(teacher, seg)], seg)) %>%
       pivot_wider(names_from = seg, values_from = claim)
       #or use spread
       # spread(seg, claim)
    #  teacher   seg1 seg_double1   seg2 seg_double2
    #1       A   beth        john   john        beth
    #2       B summer        <NA> summer        <NA>
    #3       C hannah        <NA> hannah        <NA>