Search code examples
pythonunicodecharacter-encoding

Escape Unicode characters into ASCII, and converting them back into Unicode with no loss


I'm doing a Python project where I must encode a string into ASCII to use it, and later take it back and convert it back into Unicode. The problem is that the string may contain characters like accents, which aren't in the ASCII charset. I would like to escape these in some form in my string, then take them back with no loss.

I tried poking around with .encode and .decode to no avail, and looked at some answers on this site, but none convered both the encoding and decoding. I've seen the ascii function in Python, but it has no opposite.

Note: Using JSON, like proposed in the duplicate notes, isn't what I can do in that case. I needed to be able to have pure ASCII, which can be converted back later.

2nd note: Using a byte format, also proposed in the duplicate notes, won't work either. I need a STRING in ASCII.


Solution

  • I would use urllib.parse.quote and urllib.parse.unquote for that purpose, simple example:

    import urllib.parse
    text_with_accent = "El Niño"
    ascii_text = urllib.parse.quote(text_with_accent)
    print(ascii_text)  # El%20Ni%C3%B1o
    print(ascii_text.isascii())  # True
    original_text = urllib.parse.unquote(ascii_text)
    print(text_with_accent==original_text)  # True
    

    urllib.parse is part of standard library so you do not need installing anything beyond python itself.