I have a text file and it contains comma separated strings. But some of the strings separated by commas are of the form [*,*,*,...]
. So for example:
"Hello", "Goodbye", ["Yes", "No", "Maybe], "Indeed", ["Why", "What"]
I want to be able to parse the file to replace only commas within square brackets with a semicolon. There can be any number of brackets and any number of commas within the brackets.
I tried using this code in R but its not working as planned, some commas outside my brackets are being replaced:
repeat{
tmp <- gsub("(\\[.*\\K),(?=.*\\])", ";", tmp, perl = TRUE) # replace last comma found within braces with semicolon
if (sum(grepl("(\\[.*\\K),(?=.*\\])", tmp, perl = TRUE)) == 0) { # repeat until no more commas found
break
}
}
Can anyone help with regex that can solve this problem? Thanks!
To replace all commas inside square brackets with semi-colons, you may use
gsub("(?:\\G(?!^)|\\[)[^][,]*\\K,", ";", x, perl=TRUE)
See the regex demo. The regex above does not check for the closing ]
though. If it is required, use
gsub("(?:\\G(?!^)|\\[)[^][,]*\\K,(?=[^][]*])", ";", x, perl=TRUE)
Details
(?:\G(?!^)|\[)
- end of the previous match (\G(?!^)
) or (|
) a [
(\[
)[^][,]*
- 0+ chars other than [
and ]
and a ]
\K
- match reset operator that discards all the text matched so far,
- a comma(?=[^][]*])
- a positive lookahead that requires 0+ chars other than [
and ]
and a ]
immediately to the right of the current location.