Search code examples
rstringstrsplit

How to split a text using a string delimiter with R's strsplit?


Let's say I have a text file of a book that contains multiple chapters that contain text.

x <- "Chapter 1 Text. Text. Chapter 2 Text. Text. Chapter 3 Text. Text."

I would like to split up this text and get a separate file for each chapter.

"Chapter 1 Text. Text." "Chapter 2 Text. Text." "Chapter 3 Text. Text."

Ideally, I would like to save each file according to the chapter, so Chapter 1, Chapter 2 and Chapter 3.

I have tried the following:

unlist(strsplit(x, "Chapter", perl = TRUE))

Unfortunately, this deletes the delimiter, which I would like to keep.

I have also tried the following:

unlist(strsplit(x, "(?<=Chapter)", perl=TRUE))

Unfortunately, this only seems to work for a single character but not for a string.

Many thanks for your help!


Solution

  • We need to use regex lookahead

    strsplit(x, "\\s(?=Chapter)", perl = TRUE)[[1]]
    #[1] "Chapter 1 Text. Text." "Chapter 2 Text. Text." "Chapter 3 Text. Text."