Search code examples
character-encodingcharacter

Are code pages and code charts the same thing?


Based on what I have gathered so far from reading information available online:

  • character set is a bunch of characters that we want to use (like an interface)

  • character encoding is a method of encoding some character set (like an implementation)

What is the relationship between code charts and code pages and how do they fit into the overall context? I am not sure if these two terms are synonyms or if they are referring to distinct concepts.

Do code charts/code pages define character sets through large tables and also provide a method of encoding, making them a part of character encoding? Or, do they only define character sets and leave encoding implementation to another aspect? Additionally, is a locale simply a type of code chart/code page or is it a separate concept altogether?


Solution

  • In the majority of cases, character sets and character encodings are one and the same. For example, ISO-8859-1 defines the character set for Western Europe AND the encoding using an 8bit scheme.

    See the specification for ISO-8859-1: ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf, which includes the encoding implementation.

    Unicode on the other hand separates encoding from the character definition, albeit within a bunch of related documents. In Unicode, just about all current and a good deal of historic characters, symbols and modifiers are mapped to a 32 bit "code point". Encodings of UTF-32, UTF-16 and UTF-8 are then documented separately, to define how the Unicode Code Point is encoded.