Search code examples
pythonregexply

Python regular expression on single quoted string with escaped single quotes


Suppose that we have some input like this (it's an example, no matter if it makes sense or not):

data = "(((column_1 + 7.45) * 3) <>    column_2 - ('string\'1' / 2))"

Well, I need to match a string, that starts and ends with ' and may contain escaped single quotes as example above, using Python re module. So the result should be string\'1. How can we achieve it?

EDIT: I am using the PLY library and the usage should be

def t_leftOperand_arithmetic_rightOperand_STRING(self, t):
    r'<regex>'
    t.lexer.pop_state()
    return t

Solution

  • I believe you have to account for the escape being escaped as well.

    For that, you'd need '[^'\\]*(?:\\[\S\s][^'\\]*)*'


    Input

    '''Set 1 - this
    is another
    mul\'tiline
    string'''
    '''Set 2 - this
    is' a\\nother
    mul\'''tiline
    st''ring'''
    

    Benchmark:

    Regex1:   '[^'\\]*(?:\\[\S\s][^'\\]*)*'
    Options:  < none >
    Completed iterations:   400  /  400     ( x 1000 )
    Matches found per iteration:   9
    Elapsed Time:    5.00 s,   4995.27 ms,   4995267 µs
    
    
    Regex2:   '(?:[^'\\]|\\.)*'
    Options:  < s >
    Completed iterations:   400  /  400     ( x 1000 )
    Matches found per iteration:   9
    Elapsed Time:    7.00 s,   7000.68 ms,   7000680 µs
    

    Additional regex (For a test only. As @ridgerunner says this could cause a backtracking problem)

    Regex2:   '(?:[^'\\]+|\\.)*'
    Options:  < s >
    Completed iterations:   400  /  400     ( x 1000 )
    Matches found per iteration:   9
    Elapsed Time:    5.45 s,   5449.72 ms,   5449716 µs