Search code examples
pythonpython-3.xregexpython-repython-3.9

Why does second try in re.search errors out in second try and works in first try?


So, I was coding discord bot where I used on_message() to check if message contains banned word. However, it gave similar error during its execution so I tried to create a separate file to test the error. Here's the code that did similar work in discord bot.

from re import search
from re import IGNORECASE
banned_words = (r"N[a-zA-Z0-9]gga")
for banned_word in banned_words:
    if search(banned_word, input("> "), IGNORECASE):
        print("N-word detected")

Here's the test

root@kali:~# python3 test.py 
> nigga
N-word detected
> nigga
Traceback (most recent call last):
  File "/root/test.py", line 5, in <module>
    if search(banned_word, input("> "), IGNORECASE):
  File "/usr/lib/python3.9/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.9/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.9/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.9/sre_parse.py", line 549, in _parse
    raise source.error("unterminated character set",
re.error: unterminated character set at position 0

What could go wrong here? Also it shouldn't loop more than once right? I wonder how it asked input() twice.


Solution

  • banned_words is the string N[a-zA-Z0-9]gga, so for banned_word in banned_words: iterates over the characters.

    The first value of banned_word is the string N. The search for this succeeds.

    The second value of banned_word is the string [. This is not a valid regexp by itself, it's the start of a [...] character set. So you get an error.

    If banned_words is supposed to be a tuple, you need a comma:

    banned_words = (r"N[a-zA-Z0-9]gga",)
    

    But if you want to test multiple regular expressions you can simply put them all in a single regexp with | alternation:

    banned_words = r"N[a-z0-9]gga|F[a-z0-9]+ck"
    

    and then just do a single search rather than looping.