Search code examples
c++charinitializationundefined-behavior

Why does unsigned char have different default initialization behaviour than other data types?


I am reading through the cppreference page on default initialization and I noticed a section that states something along these lines:

//UB
int x;
int y = x;        
   
//Defined and ok
unsigned char c;
unsigned char d = c;

And the same rule for unsigned char, applys for std::byte aswell.

My question is why does every other non class variable (int, bool, char etc) result in UB if you try to use the value before assigning it (like above example), but not unsigned char? Why is unsigned char special?

The page I am reading for reference


Solution

  • The difference is not in initialisation behaviour. The value of uninitialised int is indeterminate and default initialisation leaves it indeterminate. The value of uninitialised unsigned char is indeterminate and default initialisation leaves it indeterminate. There is no difference there.

    The difference is that behaviour of producing an indeterminate value of type int - or any other type besides the exceptional unsigned char or std::byte - is undefined (unless the value is discarded).

    The exception for unsigned char (and later std::byte) was added to the language in C++14 when indeterminate value was properly defined (although since the change was a defect resolution, to my understanding it applies to the official standard at the time, C++11).

    I could not find a documented rationale for that design choice. Here is a timeline of the definitions (all standard quotes are from drafts):

    C89 - 1.6 DEFINITIONS OF TERMS

    Undefined behavior --- behavior, upon use of ... indeterminately-valued objects


    C89 - 3.5.7 Initialization - Semantics

    ... If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

    There are no exceptions for any type. You'll see why C standard is relevant when reading C++98 standard.

    C++98 - [dcl.init]

    ... Otherwise, if no initializer is specified for an object, the object and its subobjects, if any, have an indeterminate initial value

    There was no definition for what indeterminate value means or what happens when you use it. The intended meaning may presumably have been same as C89, but it is underspecified.

    C99 - 3. Terms, definitions, and symbols - 3.17.2

    3.17.2 indeterminate value

    either an unspecified value or a trap representation

    3.17.3 unspecified value

    valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

    NOTE An unspecified value cannot be a trap representation.


    C99 - 6.2.6 Representations of types - 6.2.6.1 General

    Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 41) Such a representation is called a trap representation.


    C99 - J.2 Undefined behavior

    The behavior is undefined in the following circumstances:

    • ...
    • The value of an object with automatic storage duration is used while it is indeterminate
    • A trap representation is read by an lvalue expression that does not have character type
    • A trap representation is produced by a side effect that modifies any part of the object using an lvalue expression that does not have character type
    • ...

    C99 introduced the term trap representation, and which also have UB when used, just like indeterminate values. Character types (which are char, unsigned char and signed char) don't have trap representations, and may be used to operate on trap representations of other types without UB.

    C++ core language issue - 616. Definition of “indeterminate value”

    The C++ Standard uses the phrase “indeterminate value” without defining it. C99 defines it as “either an unspecified value or a trap representation.” Should C++ follow suit?

    Proposed resolution (October, 2012):

    [dcl.init] paragraph 12 as follows:

    If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17 [expr.ass]). [Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2 [basic.start.init]. —end note] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

    • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:
    • the second or third operand of a conditional expression (5.16 [expr.cond]),
    • the right operand of a comma (5.18 [expr.comma]),
    • the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9 [expr.static.cast], 5.4 [expr.cast]), or
    • a discarded-value expression (Clause 5 [expr]),

    then the result of the operation is an indeterminate value.

    If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right operand of a simple assignment operator (5.17 [expr.ass]) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

    If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

    The proposed change was accepted as a defect resolution with some further changes (issue 1213) but has remained mostly the same (similar enough for purposes of this question). This is where the exception for unsigned char seems to have been introduced into C++. The core language issue has no public comments or notes about the rationale for the exception as far as I could find.