Search code examples
unicodecharacter-encodingcharacter-set

is there some characters that has the same encoding regardless which character set is used ? and if yes which are they?


just asking if there is some characters that has the same encoding in all character sets and what are those characters exactly if there is a list of them ? for example they may be the 0-9 characters or may be all English characters ... don't know !


Solution

  • There are no characters that have the same encoding in UTF-16 and ASCII (since UTF-16 is always a 2-byte encoding, and ASCII is always a 1-byte encoding). So the answer is no. (And of course I can invent a new encoding any time I like, so I can always make the answer be no. And EBCDIC has no overlap with ASCII except for a few control characters.)

    If you are asking if there are characters that are the same in commonly used 1-byte encodings, except for EBCDIC, then almost all of the ASCII range (0-127) is identical in almost all common 1-byte encodings (as well as in UTF-8). The majority of 1-byte encodings are "extended ASCII" and, with a small number of exceptions, encode 0-127 the same. So, for a carefully selected meaning of "all encodings" (which is not even close to all encodings) the digits 0-9, the Latin alphabet, and some punctuation have the the same encoding. Certain control characters are the same for a slightly broader set of encodings (but I assume you're not looking for unprintables).

    But in general, absolutely not.