Search code examples
rstringtextreplacegsub

Improving the readability of automated text generation based on a database query


I am trying to improve the readability of automated text generation based on a database query.

is there a neat way to perform these substitutions ? To do the following in 1 command instead of 6?

x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
out<-c("Test", "Test", "Test", "Test", "Test,", "Test, ", "Test,") 

x<-gsub(pattern = "( ", replacement = "(", x, fixed = T)
x<-gsub(pattern = " )", replacement = ")", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)
x<-gsub(pattern = "()", replacement = "", x, fixed = T)
x<-gsub(pattern = ",,", replacement = ",", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)

Solution

  • You can use

    x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
    gsub("\\(\\s*\\)|\\s+(?=[,)])|(?<=\\()\\s+|(,),+", "\\1", x, perl=TRUE)
    # => [1] "Test"   "Test"   "Test "  "Test "  "Test,"  "Test, " "Test, "
    

    See the R demo online and the regex demo. Details:

    • \(\s*\)| - (, zero or more whitespaces and then a ), or
    • \s+(?=[,)])| - one or more whitespaces and then either , or ), or
    • (?<=\()\s+| - one or more whitespaces immediately preceded with a ( char, or
    • (,),+ - a comma captured into Group 1 and then one or more commas.

    The replacement is the Group 1 value, namely, if Group 1 matched, the replacement is a single comma, else, it is an empty string.