Search code examples
c++char

The use of different character types in C++


The wchar_t type is guaranteed to be large enough to hold any character in the machine's largest extended character set. Why then have a need for signed char and unsigned char ? Furthermore, if there is indeed a reason to use the latter two in practice, can someone please provide small examples of when one would use signed char vs unsigned char ? The reason I ask the last question is because char is signed on some machines and unsigned on others. There is no default qualifier for char. C++ primer states that when using char you should make it explicit on which version you are using. I wonder why we even have a signed char if characters in the machine's basic character set are represented by the integrals 0 - 255.


Solution

  • While "wchar_t [is] large enough to hold any character in the machine's largest extended character set", we may know we're not storing anything "extended" and not wish to waste memory and slow the processing of text by using a larger type than we need.

    signed char and unsigned char serve as storage for integral values in the range -128..127 and 0..255 respectively, so you use them when you want such a number and care about memory usage - or better yet use int8_t and uint8_t or similar, which have the advantage and disadvantage of implying the types are a correspondingly shorter form of the [u]int16/32/64_t types: that's clearer conceptually if you're storing a number, but as int8_t et al are just typedefs you may find your numbers making unwanted matches with overloads for char - for example, my_int_8 = 65; std::cout << my_int_8 might print 'A' (as the ASCII code 65 designates 'A' rather than 65).

    unsigned char also has special significance as the type able to read raw bits from memory in unions with arbitrary other types, per the Standard.

    C++ primer states that when using char you should make it explicit on which version you are using.

    Nonsense. If you're storing simple ASCII text (with values 0 to 127), use char and let the implementation choose which one to use. This also answers another of your questions...

    I wonder why we even have a signed char if characters in the machine's basic character set are represented by the integrals 0 - 255.

    ...the "basic character set" is ASCII values 0 to 127 only. Specific systems/protocols/programs may or may not give some implementation-specified significance or graphical representation to other character values.