Search code examples
pythonunicodeutf-8translationarabic

Unicode, Bytes, and String to Integer Conversion


I am writing a program that is dealing with letters from a foreign alphabet. The program is taking the input of a number that is associated with the unicode number for a character. For example 062A is the number assigned in unicode for that character.

I first ask the user to input a number that corresponds to a specific letter, i.e. 062A. I am now attempting to turn that number into a 16-bit integer that can be decoded by python to print the character back to the user.

example:

for \u0394

print(bytes([0x94, 0x03]).decode('utf-16'))

however when I am using

int('062A', '16')

I receive this error:

ValueError: invalid literal for int() with base 10: '062A'

I know it is because I am using A in the string, however that is the unicode for the symbol. Can anyone help me?


Solution

  • however when I am using int('062A', '16'), I receive this error: ValueError: invalid literal for int() with base 10: '062A'

    No, you aren't:

    >>> int('062A', '16')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'str' object cannot be interpreted as an integer
    

    It's exactly as it says. The problem is not the '062A', but the '16'. The base should be specified directly as an integer, not a string:

    >>> int('062A', 16)
    1578
    

    If you want to get the corresponding numbered Unicode code point, then converting through bytes and UTF-16 is too much work. Just directly ask using chr, for example:

    >>> chr(int('0394', 16))
    'Δ'