Search code examples
rstringtextsplitstrsplit

Split string into multiple two-word strings


I have a very long string (~1000 words) and I would like to split it into two-word phrases.

I have this:

string <- "A B C D E F"

and I would like this:

"A B"
"B C"
"C D"
"D E"
"E F"

The long string has already been cleaned and stemmed, and stop-words have been removed.

I tried to use str_split, but (I think) this needs a separator, which here is complicated because I don't want to separate A from B only "A B" from "C D", and "B C" from "D E", etc.


Solution

  • tmp <- strsplit(string, " ")[[1]]
    tmp
    # [1] "A" "B" "C" "D" "E" "F"
    sapply(seq_along(tmp)[-1], function(z) paste(tmp[z-1:0], collapse = " "))
    # [1] "A B" "B C" "C D" "D E" "E F"