Search code examples
pythonregexfirst-order-logic

Regex Match equal amount of two characters


I'ld like to match the parameters of any function as a string using regex. As an example lets assume the following string:

predicate(foo(x.bar, predicate(foo(...), bar)), bar)

this may be part of a longer sequence

predicate(foo(x.bar, predicate(foo(...), bar)), bar)predicate(foo(x.bar, predicate(foo(...), bar)), bar)predicate(foo(x.bar, predicate(foo(...), bar)), bar)

I now want to find all substrings that represent a function/predicate and its parameters (i.e. in the first example the whole string as well as the nested predicate(foo(...), bar)). The problem is that I cant simply match like this

predicate\(.*, bar\)

as i may then match more than the parameters of the predicate if the * is greedy, or less if it is lazy. Which is because such predicates() can be nested.

I need a regex that finds the string predicate(...) where ... matches any string that contains an equal amount of ('s and )'s (lazy).

If it matters: I am using regex with the re module in python.


Solution

  • Adding the PyPI package regex, as @Tim Pietzcker suggested, you can use recursive regexes.

    >>> import regex
    >>> s = 'predicate(foo(x.bar, predicate(foo(...), bar)), bar)'
    >>> pattern = regex.compile(r'(\w+)(?=\(((?:\w+\((?2)\)|[^()])*)\))')
    >>> pattern.findall(s)
    [('predicate', 'foo(x.bar, predicate(foo(...), bar)), bar'),
     ('foo', 'x.bar, predicate(foo(...), bar)'),
     ('predicate', 'foo(...), bar'),
     ('foo', '...')]
    

    You could also constrain it to look for just "predicate":

    >>> pattern = regex.compile(r'(predicate)(?=\(((?:\w+\((?2)\)|[^()])*)\))')
    >>> pattern.findall(s)
    [('predicate', 'foo(x.bar, predicate(foo(...), bar)), bar'),
     ('predicate', 'foo(...), bar')]