Search code examples
rregexregular-language

Regular expression matching on comma bounded by nonwhite space


I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R).

Imagine I have:

j<-"Abc,Abc, and c"

and I want:

"Abc Abc, and c"

This almost works:

gsub("[^ ],[^ ]"," " ,j)

But it removes the characters either side of the commas to give:

"Ab bc, and c"

Solution

  • You may use a PCRE regex with a negative lookbehind and lookahead:

    j <- "Abc,Abc, and c"
    gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE)
    ## => [1] "Abc Abc, and c"
    

    See the regex demo

    Details:

    • (?<!\\s) - there cannot be a whitespace right before a ,
    • , - a literal ,
    • (?!\\s) - there cannot be a whitespace right after a ,

    An alternative solution is to match a , that is enclosed with word boundaries:

    j <- "Abc,Abc, and c"
    gsub("\\b,\\b", " ", j)
    ## => [1] "Abc Abc, and c"
    

    See another R demo.