Search code examples
pythonregexasciihashlibordinal

Remove all characters from a string who's ordinals are out of range


What is a good way to remove all characters that are out of the range: ordinal(128) from a string in python?

I'm using hashlib.sha256 in python 2.7. I'm getting the exception:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u200e' in position 13: ordinal not in range(128)

I assume this means that some funky character found its way into the string that I am trying to hash.

Thanks!


Solution

  • new_safe_str = some_string.encode('ascii','ignore') 
    

    I think would work

    or you could do a list comprehension

    "".join([ch for ch in orig_string if ord(ch)<= 128])
    

    [edit] however as others have said it may be better to figure out how to deal with unicode in general... unless you really need it encoded as ascii for some reason