A
in UTF-8 is U+0041 LATIN CAPITAL LETTER A
. A
in ASCII is 065
.
How is UTF-8 is backwards-compatible with ASCII?
ASCII uses only the first 7 bits of an 8 bit byte. So all combinations from 00000000
to 01111111
. All 128 bytes in this range are mapped to a specific character.
UTF-8 keeps these exact mappings. The character represented by 01101011
in ASCII is also represented by the same byte in UTF-8. All other characters are encoded in sequences of multiple bytes in which each byte has the highest bit set; i.e. every byte of all non-ASCII characters in UTF-8 is of the form 1xxxxxxx
.