I am trying to remove unwanted characters from a text (Commission.txt) using this:
File = open("/Applications/Python 3.9/Comission.txt",encoding="Latin-1")
Commission=File.read()
CommissionClean = re.sub(r'(Ñ)(Ó)(Ò)(xCA)(xca)([*\])','',Commission)
But receive the following error message:
raise source.error("unterminated character set",
re.error: unterminated character set at position 20
The \
makes the following ]
part of the character set, rather than being a member of the character set. As such, the parser also includes )
in the set, and it is still waiting to see the closing ]
when the string ends.
You need to escape the backslash itself to make it part of the character set.
CommissionClean = re.sub(r'(Ñ)(Ó)(Ò)(xCA)(xca)([*\\])','',Commission)
This is above and beyond the use of a raw string literal, which prevents the backslashes from being used to define the string literal that the regex parser sees.