I'm trying to find all the substrings in a string that only have a single bracket on both sides.
Example: '(pop((hello world))(goodbye(now))(hi)jump)'
I would like to get this list: ['(hi)', '(pop((hello world))(goodbye(now))(hi)jump)']
because they are the only substrings that have exactly one bracket on both sides.
(substrings like '(now))(hi)jump)'
don't count because they aren't complete).
My code here:
import re
l = '(M264/M274)+(((551/882)+362/(362/551/882)+889)/((551/882)+362/(362/551/882)+889)+(241/242/275/550/551+882/889/362))'
print(re.findall(r"[^\(.](\([^\(.].*?[^\).]\))[^\).]", l))
will return:
['(362/551/882)', '(362/551/882)']
I'm not sure how to make it include (M264/M274)
.
Please assist me.
Using a negated character class like [^\(.]
and [^\).]
match any char that is not listed in the character class and require at least a single char. This part .*? can match any char except a newline, so it can also match parenthesis.
It does not take balanced parenthesis into account, but if you want to match (M264/M274)
like in your question, you could use lookarounds in combination with a negated character class
(?<!\()\([^()]+\)(?!\))
In parts
(?<!\()
Negative lookbehind, assert what is on the left is not (
\(
Match (
[^()]+
Match any char except (
or )
\)
Match )
(?!\))
Negative lookahead, assert what is on the right is not `)If you want to match 2 opening and closing parenthesis, you can also make use of lookarounds and note that this also does not take balanced parenthesis into account.
(?<!\()\(\((?!\().*?\)\)(?<!\)..)(?!\))
In parts (using the same mechanism or lookarounds, only this time using .*?
(?<!\()
Negative lookbehind, assert what is on the left is not (
\(\(
Match ((
(?!\))
Negative lookahead, assert what is on the right is not `).*?
Match any char except a newline 0+ times\)\)
Match ))
(?<!\)..)
Negative lookbehind, assert not a )
before ))
(?!\))
Negative lookahead, assert what is on the right is not `)