Search code examples
rregexpcre

Regex for replacing commas only within square brackets


I have a text file and it contains comma separated strings. But some of the strings separated by commas are of the form [*,*,*,...]. So for example:

"Hello", "Goodbye", ["Yes", "No", "Maybe], "Indeed", ["Why", "What"]

I want to be able to parse the file to replace only commas within square brackets with a semicolon. There can be any number of brackets and any number of commas within the brackets.

I tried using this code in R but its not working as planned, some commas outside my brackets are being replaced:

repeat{
          tmp <- gsub("(\\[.*\\K),(?=.*\\])", ";", tmp, perl = TRUE) # replace last comma found within braces with semicolon
          if (sum(grepl("(\\[.*\\K),(?=.*\\])", tmp, perl = TRUE)) == 0) {  # repeat until no more commas found
            break
          }
        }

Can anyone help with regex that can solve this problem? Thanks!


Solution

  • To replace all commas inside square brackets with semi-colons, you may use

    gsub("(?:\\G(?!^)|\\[)[^][,]*\\K,", ";", x, perl=TRUE)
    

    See the regex demo. The regex above does not check for the closing ] though. If it is required, use

    gsub("(?:\\G(?!^)|\\[)[^][,]*\\K,(?=[^][]*])", ";", x, perl=TRUE)
    

    See another regex demo

    Details

    • (?:\G(?!^)|\[) - end of the previous match (\G(?!^)) or (|) a [ (\[)
    • [^][,]* - 0+ chars other than [ and ] and a ]
    • \K - match reset operator that discards all the text matched so far
    • , - a comma
    • (?=[^][]*]) - a positive lookahead that requires 0+ chars other than [ and ] and a ] immediately to the right of the current location.