Search code examples
unicodeutf-8

Why does the 'degree' symbol differ from UTF-8 from Unicode?


Why does degree symbol differ from UTF-8 from Unicode?

According to http://www.utf8-chartable.de/ and http://www.fileformat.info/info/unicode/char/b0/index.htm, Unicode is B0, but UTF-8 is C2 B0 How come?


Solution

  • UTF-8 is a way to encode UTF characters using variable number of bytes (the number of bytes depends on the code point).

    Code points between U+0080 and U+07FF use the following 2-byte encoding:

    110xxxxx 10xxxxxx
    

    where x represent the bits of the code point being encoded.

    Let's consider U+00B0. In binary, 0xB0 is 10110000. If one substitutes the bits into the above template, one gets:

     11000010 10110000
    

    In hex, this is 0xC2 0xB0.