Search code examples
rlistdataframegroup-byconcatenation

How to concatenate dataframe elements in one row


I have a dataframe like this:

df <- data.frame("COL_1" = "a", "COL_2" = c("a","b","c"), "COL_3" = "b", "COL_4" = "c", "COL_6" = "d")

Output:

  DB_INDEX|COL_2|COL_3|COL_4|COL_6
1     a     a     b     c     d
2     a     b     b     c     d
3     a     c     b     c     d

I would like to create a new dataframe which contains only one row, like this:

Expected_output:

 DB_INDEX|COL_2|COL_3|COL_4|COL_6
1     a  [a,b,c]  b     c     d

So, basically I would like to "group by" columns db_index,3,4,6, and create a list of values where they are different in col_2. DB_INDEX, COL_3, COL_4, COL_5 will always have the same value, only COL_2 is changing.

Any ideas that can help me?

edited

I found this:

  df <- df %>% group_by(COL_1) %>% 
    mutate(COL_2 = paste0(COL_2, collapse = " "))

But it returns a string, so I tried to modify it as this:

  df <- df %>% group_by(COL_1) %>% 
    mutate(COL_2 = as.list(COL_2)

But it returns this error:

replacement has 2 rows, data has 1 

COL_2 row must contain a list, I know in Python Pandas is possible, is it in R?

It's my first week in R so I am finding some trouble. Thanks in advance :)


Solution

  • You can try:

    df %>%
     group_by(DB_INDEX) %>%
     summarise_all(~ list(unique(.)))
    
      DB_INDEX COL_2     COL_3     COL_4     COL_6    
      <chr>    <list>    <list>    <list>    <list>   
    1 a        <chr [3]> <chr [1]> <chr [1]> <chr [1]>