I am writing a forum in Python. I want to strip input containing the right-to-left mark and things like that. Suggestions? Possibly a regular expression?
If you simply want to restrict the characters to those of a certain character set, you could encode the string in that character set and just ignore encoding errors:
>>> uc = u'aäöüb'
>>> uc.encode('ascii', 'ignore')
'ab'