Search code examples
unicodeutf-32

Convert Unicode to UTF 32


How I convert U+0065 to UTF-32 format ?

U+0065
0000 0000 0110 0101

UTF-32
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxxx

Convert U+0065 to UTF-32:

 0000 0000 0000 0000 0000 0000 0110 0101

Result in hex is 0x00000065

Is that correct ?


Solution

  • Yes, it is correct.

    UTF-32 is always written using 32 bits. Unicode defines codepoints up to U+10FFFF, which uses 21 bits. So a UTF-32 value is always the same as the codepoint itself.

    Because U+0065 is in the U+0000..U+007F range, it is written in UTF-8 using 8 bits (01100101). In UTF-16, it is the same using 16 bits (00000000 01100101), and in UTF-32 using 32 bits (00000000 00000000 00000000 01100101).