Search code examples
encodingutf-32

Why UTF-32 uses four bytes?


If UTF-32 is UCS-4 restricted to 17 planes (1114111 char points) which requires 21 bits, what is the fourth byte doing?


Solution

  • The fourth byte is just sitting there, occupying space (which is filled with 0s).

    In theory, a 21-bit or 24-bit interchange format could have been designed. In practice, those are both quite awkward. Few (if any) modern computers have 21- or 24-bit datatypes. Since 32-bit words are easy to work with, it is quite common to use them to store numeric datatypes whose maxima are considerably less than 231-1.