Search code examples
python-3.xregexregex-negation

How to Perform a negative Look-ahead after unknown number of closing brackets?


Question: If I don't know the the number of closing brackets in advance, how can I make sure the last closing bracket is NOT followed by a specific character like 'Q?

[[Thing]]Q Additional text... -> Don't Match if Q follows the closing brackets<br>
[[Thing]]M Additional text... -> Match, b/c the character following the close brackets is NOT Q

In my case the bracket count is variable like

  • {Thing}S
  • {{Thing}}S
  • [{Thing}]S
  • [[[[[Thing]]]]]S


It's probably something simple/obvious, I hope. To solve the problem, I could probably add an extra filtering by another round of regex (post match) or python string filtering.
I'd rather have an elegant solution if possible

Much appreciation in advance

Python 3 below

import regex as re

# regex = r'\[.+?\]+'
ob = r'[\[\{]+'                       # open brackets or curly braces
cb = r'[\]\}]'                        # close brackets or curly braces
text_within_brackets = r'[^\[\{]+'    # anything but open brackets or curly braces
ident = 'Q'                           # exclude bracket text that end with Q

working_regex = (
    fr'{ob}'
        fr'{text_within_brackets}'
    fr'{cb}{{2}}'                     # close bracket seems I have to know the number of end brackets in advance?
    fr'[^{ident}]'                    # if the ending ident is 'Q', don't match
)

not_working_regex = (
    fr'{ob}'
        fr'{text_within_brackets}'
    fr'{cb}+'                         # thought this might be greedy?
    fr'[^{ident}]'                    # if the ending ident is 'Q', don't match
)


lst = [
    '[[Thing in Brackets]]Q Text to Ignore (No Match)',
    '[[Thing in Brackets]]S Text to Ignore (Match)'
]

print('working regex. Close bracket count known in advance')
for i in lst:
    print(f'{i} -> {re.search(working_regex, i)}')

print('\nnot working regex. Close bracket count NOT known in advance')
for i in lst:
    print(f'{i} -> {re.search(not_working_regex, i)}')

Output

working regex. Close bracket count known in advance
[[Thing in Brackets]]Q Text to Ignore (No Match) -> None
[[Thing in Brackets]]S Text to Ignore (Match) -> <regex.Match object; span=(0, 22), match='[[Thing in Brackets]]S'>

not working regex. Close bracket count NOT known in advance
[[Thing in Brackets]]Q Text to Ignore (No Match) -> <regex.Match object; span=(0, 21), match='[[Thing in Brackets]]'>
[[Thing in Brackets]]S Text to Ignore (Match) -> <regex.Match object; span=(0, 22), match='[[Thing in Brackets]]S'>

Solution

  • The second regex doesn't work as you expect because when the regex engine finds the "Q" it will back track one position (matching one less closing bracket) and will then see the next character is not a "Q" (but a closing bracket) and happily complete a match.

    So you should not only reject a "Q", but also a closing bracket. Also, you would need to deal with the case where there is nothing following the closing bracket -- which I assume should result in a match. For that reason it is better to use a look ahead assertion (which you already mentioned in the title of your question):

    not_working_regex = (
        fr'{ob}'
            fr'{text_within_brackets}'
        fr'{cb}+'
        fr'(?!{ident}|{cb})'  # don't allow "Q", nor closing bracket to follow the match
    )
    

    ...and then change the variable name as now it works ;-)