Search code examples
rbracketspunctuation

How to remove punctuation inside brackets in R


I have tried to split documents into sentences, but there are some strange outcomes due to punctuation inside brackets. So I'd like to remove any punctuation.

example input:

A <- c('How to remove all punctuations(like this?) in side it?')

wanted output:

"How to remove all punctuations(like this) in side it?"

Solution

  • Perhaps something like this using a positive lookahead?

    gsub("[?!;,.](?=\\))", "", A, perl = T)
    #[1] "How to remove all punctuations(like this) in side it?"
    

    Or using POSIX character classes

    gsub("[[:punct:]](?=\\))", "", A, perl = T)
    

    Or if you need to match other types of closing brackets (e.g. curly, square)

    gsub("[[:punct:]](?=[)\\]}])", "", A, perl = T)