Search code examples
rdplyrreshape2

Split a long form into rows but with duplicates merged as a string delimited with ";"?


Suppose I have a dataframe that I want to split into rows.

temp = data.frame ( group=c('a','b','c'), fruits = c('apple', 'orange', 'none'), days=c('mon','tues','wed') )
reshape2::dcast(temp , days ~ group, value.var=c ( "fruits") )
  days     a      b    c
1  mon apple   <NA> <NA>
2 tues  <NA> orange <NA>
3  wed  <NA>   <NA> none

This works fine, however when I add a duplicated row as such.

temp = rbind ( temp, c('a','orange','mon')  )

the casting would failed and only show the total. What I really want is something like this.

 days     a      b    c
1  mon apple;orange   <NA> <NA>
2 tues  <NA> orange <NA>
3  wed  <NA>   <NA> none

thanks!


Solution

  • Try this tidyverse solution. You can aggregate your data using paste0() to get a proper structure in order to transform to wide format:

    library(tidyverse)
    #Code
    temp %>%
      group_by(group,days) %>%
      summarise(fruits=paste0(fruits,collapse = ';')) %>%
      pivot_wider(names_from = group,values_from=fruits)
    

    Output:

    # A tibble: 3 x 4
      days  a            b      c    
      <chr> <chr>        <chr>  <chr>
    1 mon   apple;orange NA     NA   
    2 tues  NA           orange NA   
    3 wed   NA           NA     none