Search code examples
pythonregexstringparentheses

Python Regex match parenthesis but not nested parenthesis


Is it possible to match parenthesis like () but not allowing nesting? In other words, I want my regex to match () but not (()) The regex that I am trying is

\(\[^\(\)])

but it does not seem to be working. Can someone explain to me what I'm doing wrong?


Solution

  • If (foo) in x(foo)x shall be matched, but (foo) in ((foo)) not, what you want is not possible with regular expressions, as regular expressions represent regular grammars and all regular grammars are context free. But context (or 'state', as Jonathon Reinhart called it in his comment) is necessary for the distinction between the (foo) substrings in x(foo)x and ((foo)).

    If you only want to match strings that only consist of a parenthesized substring, without any parentheses (matched or unmatched) in that substring, the following regex will do:

    ^\([^()]*\)$
    
    • ^ and $ 'glue' the pattern to the beginning and end of the string, respectively, thereby excluding partial matches
    • note the arbitrary number of repetitions (…*) of the non-parenthesis character inside the parentheses.
    • note how special characters are not escaped inside a character set, but still have their literal meaning. (Putting backslashes in there would put literal backslashes in the character set. Or in this case out of the character set, due to the negation.)
    • note how the [ starting the character set isn't escaped, because we actually want its special meaning, rather than is literal meaning

    The last two points might be specific to the dialect of regular expressions Python uses.

    So this will match () and (foo) completely, but not (not even partially) (foo)bar), (foo(bar), x(foo), (foo)x or ()().