I'm trying to write a regular expression to match double-quotes inside CSV fields (and replace them with escaped double-quotes, but the replacement part is easy).
So I want
"field1","field2" -> "field1","field2"
"field1","fie"ld2" -> "field1","fie""ld2"
I'm using (?<!;)"(?!;)
as my matching expression, which nearly works - but it doesn't handle the quote at the start or the end of the line. I need something like either (?<![;$])"(?![;^])
which doesn't work because the exact $
and ^
characters are matched here - or (?<!(;|$))"(?!(;|^))
which also doesn't work because a negative lookbehind can't be variable length.
What's the correct way of doing this please?
PCRE regex engine does not allow capturing groups with alternation operator inside them inside a lookbehind.
You can re-phrase the regex you tried as
(?<!;|^)"(?!;|$)
(?<=[^;])"(?=[^;])
See the regex demo and this regex demo.
The (?<!;|^)"(?!;|$)
pattern matches a "
char that is not at the start of string and not immediately preceded with a ;
char (due to (?<!;|^)
) and that is not immediately followed with a ;
char and not at the end of string (see (?!;|$)
).
The (?<=[^;])"(?=[^;])
regex matches a "
that is immediately preceded with a char other than a ;
(so, no start of string position is allowed) and that is followed with a char other than a ;
(no end of string position allowed).