Search code examples
rregexgsubpunctuation

Remove all punctuation except apostrophes in R


I'd like to use R's gsub to remove all punctuation from a text except for apostrophes. I'm fairly new to regex but am learning.

Example:

x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[[:punct:]]", "", as.character(x))

Current Output (no apostrophe in don't)

[1] "I like to chew gum but dont like bubble gum"

Desired Output (I desire the apostrophe in don't to stay)

[1] "I like to chew gum but don't like bubble gum"

Solution

  • x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
    gsub("[^[:alnum:][:space:]']", "", x)
    
    [1] "I like to chew gum but don't like bubble gum"
    

    The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.