Search code examples
rtextreplacefinancesec

How to cut all Lines/Characters in R after specific Characters


I am currently taking a course that teaches textual analysis in R. As I am fairly new to R, I could not figure out yet how to cut all Lines after a specific set of characters.

For example, I have the following given:

documentName <- "Hello my name is Johann my had is the largest to be deleted X"

My desired outcome is:

documentName <- "Hello my name is Johann"

So far I have tried the following but it is not getting me anywhere.

gsub("(\Johann).*\\","",documentName)

Any hint would be much appreciated.


Solution

  • Here is one way, capturing all content appearing before Johann:

    x <- "Hello my name is Johann my had is the largest to be deleted"
    out <- sub("^(.*\\bJohann)\\b.*$", "\\1", x)
    out
    
    [1] "Hello my name is Johann"
    

    Another approach, stripping off all content appearing after Johann:

    sub("(?<=\\bJohann)\\s+.*$", "", x, perl=TRUE)