Search code examples
pythonregexpypi-regex

Recursive Regex with a Pattern Matching only on Start of Match before Recursion?


I'm trying to find matching parentheses where there are also some more in the middle. I have the following regex that does that, it matches the parenthesis to find the one associated with it. What I need now is that it also searches a prefix for example "Test". It however recurses the entire pattern. I can't simply include "Test" in the start of the pattern. I also only want it to match when the prefix starts with "Test" and not just match any parentheses.

I want to replace the "Test( ... )" with something else, so it'd be preferable if it was one pattern so I can just use regex.sub().

import regex

# want to match 'Test(' prefix specifically
regex.search('\(([^()]|(?R))*\)', '... () ... Test(123, Test(123, (3), 3))')  

Solution

  • That is the case when you need to use subroutines. Here, you need to enclose the recursed pattern in a capturing group and then use (?1) construct to recurse it:

    import regex
    m = regex.search(r'Test(\((?:[^()]++|(?1))*\))', 'Test(123, Test(123, (3), 3))')
    if m:
        print(m.group()) # => Test(123, Test(123, (3), 3))
    

    See the Python demo.

    Details

    • Test - a prefix word
    • (\((?:[^()]++|(?1))*\)) - Capturing group 1 (that will be recursed with (?1)):
      • \( - a ( char
      • (?:[^()]++|(?1))* - zero or more reptitions of
        • [^()]++ - 1+ chars other than ( and ) (possessive match for better efficiency)
        • | - or
        • (?1) - a subroutine to recurse Capturing group #1 subpattern
      • \) - a ) char.