Search code examples
regexlinuxpcre

Regex to match quotes in CSV inside fields


I'm trying to write a regular expression to match double-quotes inside CSV fields (and replace them with escaped double-quotes, but the replacement part is easy).

So I want

"field1","field2" -> "field1","field2"
"field1","fie"ld2" -> "field1","fie""ld2"

I'm using (?<!;)"(?!;) as my matching expression, which nearly works - but it doesn't handle the quote at the start or the end of the line. I need something like either (?<![;$])"(?![;^]) which doesn't work because the exact $ and ^ characters are matched here - or (?<!(;|$))"(?!(;|^)) which also doesn't work because a negative lookbehind can't be variable length.

What's the correct way of doing this please?


Solution

  • PCRE regex engine does not allow capturing groups with alternation operator inside them inside a lookbehind.

    You can re-phrase the regex you tried as

    (?<!;|^)"(?!;|$)
    (?<=[^;])"(?=[^;])
    

    See the regex demo and this regex demo.

    The (?<!;|^)"(?!;|$) pattern matches a " char that is not at the start of string and not immediately preceded with a ; char (due to (?<!;|^)) and that is not immediately followed with a ; char and not at the end of string (see (?!;|$)).

    The (?<=[^;])"(?=[^;]) regex matches a " that is immediately preceded with a char other than a ; (so, no start of string position is allowed) and that is followed with a char other than a ; (no end of string position allowed).