Search code examples
unicode

Whatever happened to the Unicode Character MATHEMATICAL DOUBLE-STRUCK CAPITAL C?


If you look at the Unicode Block for Mathematical Alphanumeric Symbols you notice that the MATHEMATICAL DOUBLE-STRUCK CAPITAL C is missing. And it is not the only one. Why? What is the point of having DOUBLE-STRUCK if you don't have all 26?


Solution

  • It's a fluke of history

    ℂ exists in Unicode, but at a different location from the rest of the blackboard bold alphabet. Back in 1991, ℂ was part of the first edition of Unicode, but there was no Mathematical Block. Instead, it got lumped into a hodgepodge category called Letter-like Symbols, along with , , and . For the first decade of its existence, Unicode had only these double-struck letters:

    ℂ  U+2102 the set of complex numbers

    ℍ  U+210D the algebra of quaternions (Hamilton)

    ℕ  U+2115 the set of natural numbers

    ℙ  U+2119

    ℚ  U+211A the set of rational numbers

    ℝ  U+211D the set of real numbers

    ℤ  U+2124 the set of integers

    But why did those symbols get put there seemingly all higgledy-pigglety? The answer is two-fold: First, those particular symbols, except ℙ, had been placed together in the much older XCCS, the Xerox Character Code Standard, which Unicode evolved from.

    Second, Unicode had a limited address space even at the start and had to be very picky about which characters could be added. The original proposal for Unicode in 1988 by Dr. Joseph Becker — who had created XCCS in 1980 — made the case for the "sufficiency of 16 bits" like so,

    Are 16 bits, providing at most 65,536 distinct codes, sufficient to encode all characters of all the world’s scripts? Since the definition of a “character” is itself part of the design of a text encoding scheme, the question is meaningless unless it is restated as: Is it possible to engineer a reasonable definition of “character” such that all the world’s scripts contain fewer than 65,536 of them?

    The answer to this is Yes.

    Of course, with 20/20 hindsight, we can see that Dr. Becker was mistaken, though perhaps he can be forgiven for not predicting the astronomical explosion of legitimate definitions of "character" that people wanted in Unicode (and maybe some, not so legitimate… 🯁🯂🯃🤮). By version 2.0 of Unicode (1996) the standard recognized that 16-bits wasn't enough and defined 15 more "planes", each as large as the original 16-bit code space. The original plane was retroactively named "Plane 0" or "the BMP" (Basic Multilingual Plane).

    However, that extra space wasn't used immediately. We'd have to wait until 2001 for Unicode 3.1 to blow open the gates, adding 44,000 new characters to Unicode, including the double-struck alphabet in question here:

    𝔸  U+1D538

    𝔹  U+1D539

    𝔻  U+1D53B

    𝔼  U+1D53C

    𝔽  U+1D53D

    𝔾  U+1D53E

    𝕀  U+1D540

    𝕁  U+1D541

    𝕂  U+1D542

    𝕃  U+1D543

    𝕄  U+1D544

    𝕆  U+1D546

    𝕊  U+1D54A

    𝕋  U+1D54B

    𝕌  U+1D54C

    𝕍  U+1D54D

    𝕎  U+1D54E

    𝕏  U+1D54F

    𝕐  U+1D550

    𝕒  U+1D552

    𝕓  U+1D553

    𝕔  U+1D554

    𝕕  U+1D555

    𝕖  U+1D556

    𝕗  U+1D557

    𝕘  U+1D558

    𝕙  U+1D559

    𝕚  U+1D55A

    𝕛  U+1D55B

    𝕜  U+1D55C

    𝕝  U+1D55D

    𝕞  U+1D55E

    𝕟  U+1D55F

    𝕠  U+1D560

    𝕡  U+1D561

    𝕢  U+1D562

    𝕣  U+1D563

    𝕤  U+1D564

    𝕥  U+1D565

    𝕦  U+1D566

    𝕧  U+1D567

    𝕨  U+1D568

    𝕩  U+1D569

    𝕪  U+1D56A

    𝕫  U+1D56B

    𝟘  U+1D7D8

    𝟙  U+1D7D9

    𝟚  U+1D7DA

    𝟛  U+1D7DB

    𝟜  U+1D7DC

    𝟝  U+1D7DD

    𝟞  U+1D7DE

    𝟟  U+1D7DF

    𝟠  U+1D7E0

    𝟡  U+1D7E1

    One can see that those codepoints are not in the original 16-bit BMP because they require 5 hexadecimal digits to represent instead of just 4.

    By the end of 2001, with version 3.2, which added a few remaining mathematical brackets and symbols, like ℽ, ⅀, and ⅁, it could finally be said that “Unicode includes virtually all the standard characters used in mathematics”.