Search code examples
stringutf-8ascii

Why Utf8 is compatible with ascii


A in UTF-8 is U+0041 LATIN CAPITAL LETTER A. A in ASCII is 065.

How is UTF-8 is backwards-compatible with ASCII?


Solution

  • ASCII uses only the first 7 bits of an 8 bit byte. So all combinations from 00000000 to 01111111. All 128 bytes in this range are mapped to a specific character.

    UTF-8 keeps these exact mappings. The character represented by 01101011 in ASCII is also represented by the same byte in UTF-8. All other characters are encoded in sequences of multiple bytes in which each byte has the highest bit set; i.e. every byte of all non-ASCII characters in UTF-8 is of the form 1xxxxxxx.