Search code examples
rdataframetibble

How to merge the rows of a tibble to collapse cells into a single one


I have a tibble with a thousand something rows, and two columns, the first of which contains a categorical label, and the second a character string. Here is a MRE

tab <- tibble(category = c("CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2"),
              word = c("Lorem", "ipsum", "dolor", "sit", "amet", "Consectetur", "adipiscing", "elit", "nam"))
tab

# A tibble: 9 x 2
  category word       
  <chr>    <chr>      
1 CAT1     Lorem      
2 CAT1     ipsum      
3 CAT1     dolor      
4 CAT1     sit        
5 CAT1     amet       
6 CAT2     Consectetur
7 CAT2     adipiscing 
8 CAT2     elit       
9 CAT2     nam

Now, what I'd like to do is to collapse these rows in order to have only one row per category and all the words for that category together in a single cell, separated by a semi-colon. Like that:

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem; ipsum; dolor; sit; amet    
2 CAT2     Consectetur; adipiscing; elit; nam

Does anyone know how I could tackle this issue and is willing to fly my rescue?


Solution

  • We can use

    library(dplyr)
    library(stringr)
    tab %>%
        group_by(category) %>%
        summarise(word = str_c(word, collapse ="; "))
    

    -output

    # A tibble: 2 x 2
      category word                              
      <chr>    <chr>                             
    1 CAT1     Lorem; ipsum; dolor; sit; amet    
    2 CAT2     Consectetur; adipiscing; elit; nam