Search code examples
pythonunicoderight-to-left

Removing right-to-left mark and other unicode characters from input in Python


I am writing a forum in Python. I want to strip input containing the right-to-left mark and things like that. Suggestions? Possibly a regular expression?


Solution

  • If you simply want to restrict the characters to those of a certain character set, you could encode the string in that character set and just ignore encoding errors:

    >>> uc = u'aäöüb'
    >>> uc.encode('ascii', 'ignore')
    'ab'