Search code examples
pythonbracketsregexp-substrpython-re

Find all the substrings in a string that only have a single bracket on both sides


I'm trying to find all the substrings in a string that only have a single bracket on both sides.

Example: '(pop((hello world))(goodbye(now))(hi)jump)'

I would like to get this list: ['(hi)', '(pop((hello world))(goodbye(now))(hi)jump)'] because they are the only substrings that have exactly one bracket on both sides.

(substrings like '(now))(hi)jump)' don't count because they aren't complete).

My code here:

import re
l = '(M264/M274)+(((551/882)+362/(362/551/882)+889)/((551/882)+362/(362/551/882)+889)+(241/242/275/550/551+882/889/362))'
print(re.findall(r"[^\(.](\([^\(.].*?[^\).]\))[^\).]", l))

will return:

['(362/551/882)', '(362/551/882)']

I'm not sure how to make it include (M264/M274). Please assist me.


Solution

  • Using a negated character class like [^\(.] and [^\).] match any char that is not listed in the character class and require at least a single char. This part .*? can match any char except a newline, so it can also match parenthesis.

    It does not take balanced parenthesis into account, but if you want to match (M264/M274) like in your question, you could use lookarounds in combination with a negated character class

    (?<!\()\([^()]+\)(?!\))
    

    In parts

    • (?<!\() Negative lookbehind, assert what is on the left is not (
    • \( Match (
    • [^()]+ Match any char except ( or )
    • \) Match )
    • (?!\)) Negative lookahead, assert what is on the right is not `)

    Regex demo

    If you want to match 2 opening and closing parenthesis, you can also make use of lookarounds and note that this also does not take balanced parenthesis into account.

    (?<!\()\(\((?!\().*?\)\)(?<!\)..)(?!\))
    

    In parts (using the same mechanism or lookarounds, only this time using .*?

    • (?<!\() Negative lookbehind, assert what is on the left is not (
    • \(\( Match ((
    • (?!\)) Negative lookahead, assert what is on the right is not `)
    • .*? Match any char except a newline 0+ times
    • \)\) Match ))
    • (?<!\)..) Negative lookbehind, assert not a ) before ))
    • (?!\)) Negative lookahead, assert what is on the right is not `)

    Regex demo