Search code examples
pythonregexsingle-quotes

Python regex to match quoted string with escaped single quotes


I was using this pattern to match single quoted strings in parser:

"'.+?'"

But I need regex that can find single quoted string with postgres like escape of single qoutes (doubling single qoutes). Need to match something like this:

"'first', 'sec''ond', 't''hi''rd'"

I want to find shortest matches for strings that start and end with single single quotes, so the string above would mean 3 substrings:

'first'
'sec''ond'
't''hi''rd'

Solution

  • Certainly, '(?:[^']|'')*' is the working regex for this: it matches a ' followed with zero or more characters other than ' or double 's followed with a trailing '.

    However, to make it more efficient, you can unroll it using the unroll-the-loop technique.

    '[^']*(?:''[^']*)*'
    

    See the regex demo and pay attention how many steps it takes for the regexps to find all matches.

    The regex can be read as

    • ' - match a '
    • [^']* - then zero or more characters other than '
    • (?:''[^']*)* - then zero or more sequences of '' followed with zero or more characters other than '
    • ' - and then match the trailing '.

    This regex has a linear pattern involving as little backtracking as possible.

    Just a note: you can still make your regex work for the current scenario if you add a lookahead checking if there is a , or the end of string after the trailing ':

    '.+?'(?=,|$)
         ^^^^^^^
    

    See the regex demo. However, it is context dependent and less efficient than the unrolled regex.