Search code examples
unicodedecimalcodepoint

In unicode standard, why does U+12ca = 0x12ca? Where does the 0 come from and how does 0x12ca = 4810 decimal


I'm learning about Unicode basics and I came across this passage:

"The Unicode standard describes how characters are represented by code points. A code point is an integer value, usually denoted in base 16. In the standard, a code point is written using the notation U+12ca to mean the character with value 0x12ca (4810 decimal)."

I have three questions from here.

  1. what does the ca stand for? in some places i've seen it written as just U+12. what's the difference?
  2. where did the 0 in 0x12ca come from? what does it mean?
  3. how does the value 0x12ca become 4810 decimal?

its my first post here and would appreciate any help! have a nice day y'all!!


Solution

    1. what does the ca stand for?

    It stands for the hexadecimal digits c and a.

    In some places I've seen it written as just U+12. What's the difference?

    Either that is a mistake, or U+12 is another (IMO sloppy / ambiguous) way of writing U+0012 ... which is a different Unicode codepoint to U+12ca.

    1. Where did the 0 in 0x12ca come from? what does it mean?

    That is a different notation. That is hexadecimal (integer) literal notation as used in various programming languages; e.g. C, C++, Java and so on. It represents a number ... not necessarily a Unicode codepoint.

    The 0x is just part of the notation. (It "comes from" the respective language specifications ...)

    1. How does the value 0x12ca become 4810 decimal?

    The 0x means that the remaining are hexadecimal digits (aka base 16), where:

    • a or A represents 10,
    • b or B represents 11,
    • c or C represents 12,
    • d or D represents 13,
    • e or E represents 14,
    • f or F represents 15,

    So 0x12ca is 1 x 163 + 2 x 162 + 12 x 161 + 10 x 160 ... is 4810.

    (Do the arithmetic yourself to check. Converting between base 10 and base 16 is simple high-school mathematics.)