I am currently taking a course that teaches textual analysis in R. As I am fairly new to R, I could not figure out yet how to cut all Lines after a specific set of characters.
For example, I have the following given:
documentName <- "Hello my name is Johann my had is the largest to be deleted X"
My desired outcome is:
documentName <- "Hello my name is Johann"
So far I have tried the following but it is not getting me anywhere.
gsub("(\Johann).*\\","",documentName)
Any hint would be much appreciated.
Here is one way, capturing all content appearing before Johann
:
x <- "Hello my name is Johann my had is the largest to be deleted"
out <- sub("^(.*\\bJohann)\\b.*$", "\\1", x)
out
[1] "Hello my name is Johann"
Another approach, stripping off all content appearing after Johann
:
sub("(?<=\\bJohann)\\s+.*$", "", x, perl=TRUE)