Search code examples
cundefined-behaviorinitialization

When is using an uninitialized variable undefined behavior?


If I have:

unsigned int x;
x -= x;

it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).

Two questions:

  • Is the behavior of this code indeed undefined?
    (E.g. Might the code crash [or worse] on a compliant system?)

  • If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?

    i.e. What is the advantage given by not defining the behavior here?

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?


Solution

  • Yes this behavior is undefined but for different reasons than most people are aware of.

    First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.

    What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.

    Edit: The relevant phrase of the standard is 6.3.2.1p2:

    If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

    And to make it clearer, the following code is legal under all circumstances:

    unsigned char a, b;
    memcpy(&a, &b, 1);
    a -= a;
    
    • Here the addresses of a and b are taken, so their value is just indeterminate.
    • Since unsigned char never has trap representations that indeterminate value is just unspecified, any value of unsigned char could happen.
    • At the end a must hold the value 0.

    Edit2: a and b have unspecified values:

    3.19.3 unspecified value
    valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

    Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.