I am using a black-box database to store objects from my python code which can only store ASCII characters. Let's assume this database cannot be swapped in for another, more friendly one. Unfortunately, the data I need to store is in UTF-8 and contains non-english characters, so simply converting my strings to ASCII and back leads to data loss.
My best idea for how to solve this is to convert my string to hex (which uses all ASCII-compliant characters), store it, and then upon retrieval convert the hex back to UTF-8.
I have tried varying combinations of encode and decode but none have given me the intended result.
Example of how I'd like this to work:
original_string='Parabéns'
original_string.some_decode_function('hex') # now it looks like A4 B8 C7 etc
database.store(original_string)
Upon retrieval:
retrieved_string=database.retrieve(storage_location) # now it looks like A4 B8 C7 etc
final-string=retrieved_string.decode('UTF-8) # now it looks like 'Parabéns'
You can use str.encode
to encode the string into bytes and call the bytes.hex
method to convert the bytes to its hexadecimal representation. To convert it back, use the bytes.fromhex
method to convert the hexadecimal string to bytes, and then decoded it back to the original string with bytes.decode
:
original_string = 'Parabéns'
encoded = original_string.encode().hex()
print(encoded)
print(bytes.fromhex(encoded).decode())
This outputs:
5061726162c3a96e73
Parabéns