Search code examples
pythonparsingpython-re

re.error with Python 3.9.6 / Mac OS Big Sur


I am trying to remove unwanted characters from a text (Commission.txt) using this:

File = open("/Applications/Python 3.9/Comission.txt",encoding="Latin-1")
Commission=File.read()
CommissionClean = re.sub(r'(Ñ)(Ó)(Ò)(xCA)(xca)([*\])','',Commission)

But receive the following error message:

 raise source.error("unterminated character set",
re.error: unterminated character set at position 20

Solution

  • The \ makes the following ] part of the character set, rather than being a member of the character set. As such, the parser also includes ) in the set, and it is still waiting to see the closing ] when the string ends.

    You need to escape the backslash itself to make it part of the character set.

    CommissionClean = re.sub(r'(Ñ)(Ó)(Ò)(xCA)(xca)([*\\])','',Commission)
    

    This is above and beyond the use of a raw string literal, which prevents the backslashes from being used to define the string literal that the regex parser sees.