Search code examples
rtextnlp

Multiple same rows to one row, but rows at different amount


I have data like this:

id word
1 bus
1 arrive
1 stop
1 time
1 beard
1 bearded
1 sits
2 whilst
2 argue
2 seat
2 time
2 police
3 officer
3 walks
3 intervenes

I want to convert it to a dataset like:

id word
1 arrive bus stop time beard bearded sits
2 whilst begin argue seat time
3 officer walks intervenes

Is it possible?

Thank you.


Solution

  • To add some detail to my comment:

    library(dplyr)
    
    data <- tibble::tribble(
      ~id,        ~word,
       1L,     "arrive",
       1L,        "bus",
       1L,       "stop",
       1L,       "time",
       1L,      "beard",
       1L,    "bearded",
       1L,       "sits",
       2L,     "whilst",
       2L,      "begin",
       2L,      "argue",
       2L,       "seat",
       2L,       "time",
       2L,     "police",
       3L,    "officer",
       3L,      "walks",
       3L, "intervenes"
      )
    
    data %>% 
      group_by(id) %>% 
      mutate(word = paste0(word, collapse = " ")) %>% 
      slice(1) %>% # Take the first line from each group
      ungroup()
    

    or better (so you don't need the slice):

    data %>% 
      group_by(id) %>% 
      summarise(word = paste0(word, collapse = " "))