Search code examples
rcumsum

Repeat dataframe rows based on cumsum index


I have a dataframe as follows:

data.frame(title="Title", bk=c("Book 1", "Book 1", "Book 3"), ch=c("Chapter 1", "Chapter 2", "Chapter 1"))

  title     bk        ch
1 Title Book 1 Chapter 1
2 Title Book 1 Chapter 2
3 Title Book 3 Chapter 1

How do I repeat each observation based on the cumsum index below:

id=c(1,1,1,2,2,3,3,3,3)

So that the dataframe can be expanded in such a way so as to accommodate the source vector which generated the cumsum index?

  title     bk        ch   source_vector
1 Title Book 1 Chapter 1   ...
1 Title Book 1 Chapter 1   
1 Title Book 1 Chapter 1   
2 Title Book 1 Chapter 2   
2 Title Book 1 Chapter 2   
3 Title Book 3 Chapter 1   
3 Title Book 3 Chapter 1   
3 Title Book 3 Chapter 1   
3 Title Book 3 Chapter 1   

Solution

  • This is closer to what I was looking for:

    df %>%
      mutate(str_split_content = str_split(content, " ")) %>%
      unnest()
    

    Someone posted, then revised/removed a while ago.

    The original str_split content was by punctuation, actually. So not exactly purely splitting by number of words.