I'ld like to match the parameters of any function as a string using regex. As an example lets assume the following string:
predicate(foo(x.bar, predicate(foo(...), bar)), bar)
this may be part of a longer sequence
predicate(foo(x.bar, predicate(foo(...), bar)), bar)predicate(foo(x.bar, predicate(foo(...), bar)), bar)predicate(foo(x.bar, predicate(foo(...), bar)), bar)
I now want to find all substrings that represent a function/predicate and its parameters (i.e. in the first example the whole string as well as the nested predicate(foo(...), bar)
). The problem is that I cant simply match like this
predicate\(.*, bar\)
as i may then match more than the parameters of the predicate if the *
is greedy, or less if it is lazy. Which is because such predicates() can be nested.
I need a regex that finds the string predicate(...)
where ...
matches any string that contains an equal amount of (
's and )
's (lazy).
If it matters: I am using regex with the re module in python.
Adding the PyPI package regex, as @Tim Pietzcker suggested, you can use recursive regexes.
>>> import regex
>>> s = 'predicate(foo(x.bar, predicate(foo(...), bar)), bar)'
>>> pattern = regex.compile(r'(\w+)(?=\(((?:\w+\((?2)\)|[^()])*)\))')
>>> pattern.findall(s)
[('predicate', 'foo(x.bar, predicate(foo(...), bar)), bar'),
('foo', 'x.bar, predicate(foo(...), bar)'),
('predicate', 'foo(...), bar'),
('foo', '...')]
You could also constrain it to look for just "predicate":
>>> pattern = regex.compile(r'(predicate)(?=\(((?:\w+\((?2)\)|[^()])*)\))')
>>> pattern.findall(s)
[('predicate', 'foo(x.bar, predicate(foo(...), bar)), bar'),
('predicate', 'foo(...), bar')]