I have a dataframe which contains parts of whole sentences spread across, in some cases, multiple rows of a dataframe.
For example, head(mydataframe)
# 1 Do you have any idea what
# 2 they were arguing about?
# 3 Do--Do you speak
# 4 English?
# 5 yeah.
# 6 No, I'm sorry.
Assuming a sentence can be terminated by either
"." or "?" or "!" or "..."
are there any R library functions capable of outputting the following:
# 1 Do you have any idea what they were arguing about?
# 2 Do--Do you speak English?
# 3 yeah.
# 4 No, I'm sorry.
This should work for all the sentences ending with: .
or !
x <- paste0(foo$txt, collapse = " ")
trimws(unlist(strsplit(x, "(?<=[?.!|])(?=\\s)", perl=TRUE)))
Credits to @AvinashRaj for the pointers on the lookbehind
Which gives:
#[1] "Do you have any idea what they were arguing about?"
#[2] "Do--Do you speak English?"
#[3] "yeah..."
#[4] "No, I'm sorry."
I modified the toy dataset to include a case where a string ends with ...
(as per requested by OP)
foo <- data.frame(num = 1:6,
txt = c("Do you have any idea what", "they were arguing about?",
"Do--Do you speak", "English?", "yeah...", "No, I'm sorry."),
stringsAsFactors = FALSE)