Search code examples
regexpcre

Recursive regex for matching everything in parenthesis (PCRE)


I am surprised to not easily find a similar question with an answer on SO. I would like to match everything in some functions. The idea is to remove the functions which are useless.

foo(some (content)) --> some (content)

So I am trying to match everything in the function call which can include parenthesis. Here is my PCRE regex:

(?<name>\w+)\s*\(\K
(?<e>
     [^()]+
     |
     [^()]*
         \((?&e)\)
     [^()]*
)*
(?=\))

https://regex101.com/r/gfMAIM/1

Unfortunately it doesn't work and I don't really understand why.


Solution

  • Your Group e pattern does not do the right job, currently, it matches parentheses with 1 depth level as you only recursed the e pattern once. It needs to match as many (...) substrings as there are present, and thus, the subroutine pattern needs to be inside a * or + quantified group, and it can even be "simplified" to (?<e>[^()]*(?:\((?&e)\)[^()]*)*).

    Note that your Group e pattern is equal to (?<e>[^()]+|\((?&e)\))*. [^()]* around \((?&e)\) are redundant since the [^()]+ alternative will consume the chars other than ( and ) on the current depth level.

    Also, you quantified the Group e pattern making it a repeated capturing group that only keeps the text matched during the last iteration.

    You may use

    (?<name>\w+)\s*\(\K(?<e>[^()]*(?:\((?&e)\)[^()]*)*)(?=\))
    

    See the regex demo

    Details

    • (?<name>\w+)\s*\(\K - 1+ word chars, 0+ whitespaces and ( that are omitted from the match
    • (?<e> - start of Group e
      • [^()]* - 0+ chars other than ( and )
      • (?: - start of a non-capturing group:
        • \( - a ( char
        • (?&e) - Group e pattern recursed
        • \) - a )
        • [^()]* - 0+ chars other than ( and )
      • )* - 0 or more repetitions
    • ) - end of e group
    • (?=\)) - a ) must be immediately to the right of the current location.