Search code examples
raggregatecharactertext-mining

How to aggregate group of rows by a irregular interval in R?


I have a data frame with lines of a transcription of a conversation, in which what was said by each person is separated by an empty line. I now need to aggregate the lines so that each one is a row, but the line ranges are irregular. How can I aggregate this data?

The data are like this:

Speech Sep line
Was in Augoust 0
Don't you remember? 0
1
Yes, i did 0
It was a hot Saturday 0
we were in the park 0
1
That's right 0
it was a fun day 0

I want the date to be like:

speech
Was in Augoust, Don't you remember?
Yes, i did. It was a hot Saturday, we were in the park
That's right,it was a fun day

Solution

  • Here's a way with dplyr -

    df %>% 
      mutate(group = cumsum(sep_line)) %>% 
      filter(sep_line == 0) %>% 
      group_by(group) %>% 
      summarise(
        speech = paste(speech, collapse = " ")
      ) %>% 
      select(speech)