I am trying to improve the readability of automated text generation based on a database query.
is there a neat way to perform these substitutions ? To do the following in 1 command instead of 6?
x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
out<-c("Test", "Test", "Test", "Test", "Test,", "Test, ", "Test,")
x<-gsub(pattern = "( ", replacement = "(", x, fixed = T)
x<-gsub(pattern = " )", replacement = ")", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)
x<-gsub(pattern = "()", replacement = "", x, fixed = T)
x<-gsub(pattern = ",,", replacement = ",", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)
You can use
x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
gsub("\\(\\s*\\)|\\s+(?=[,)])|(?<=\\()\\s+|(,),+", "\\1", x, perl=TRUE)
# => [1] "Test" "Test" "Test " "Test " "Test," "Test, " "Test, "
See the R demo online and the regex demo. Details:
\(\s*\)|
- (
, zero or more whitespaces and then a )
, or\s+(?=[,)])|
- one or more whitespaces and then either ,
or )
, or(?<=\()\s+|
- one or more whitespaces immediately preceded with a (
char, or(,),+
- a comma captured into Group 1 and then one or more commas.The replacement is the Group 1 value, namely, if Group 1 matched, the replacement is a single comma, else, it is an empty string.