Search code examples
rdataframedplyrgroup-byaggregate

Make a list from dataframe column


From a data frame df (a and b are column names)

a b
xx Apple
yy Orange
zz Apple
dd Mango
pp Mango

I would like the output as

Apple xx,zz
Orange yy
Mango dd,pp

I tried aggregate and group_by but failed.


Solution

  • Base R:

    A single aggregate would be enough for this operation. Here you would apply FUN to column a with a grouping on column b (a ~ b). The function to use is paste with argument collapse so that the multiple strings would be collapsed into a single one.

    aggregate(a ~ b, df, FUN = paste, collapse = ",")
    
           b     a
    1  Apple xx,zz
    2  Mango dd,pp
    3 Orange    yy
    

    Dplyr

    Since you mentioned group_by, the correct syntax to do so in dplyr is as follows:

    library(dplyr)
    
    df %>% group_by(b) %>% summarize(a = paste(a, collapse = ","))
    
    # A tibble: 3 × 2
      b      a    
      <chr>  <chr>
    1 Apple  xx,zz
    2 Mango  dd,pp
    3 Orange yy