Search code examples
c++undefined-behaviorc++26erroneous-behavior

What is erroneous behavior? How is it different from undefined behavior?


C++26 has introduced erroneous behavior in addition to undefined, unspecified, and implementation-defined behavior (see Undefined, unspecified and implementation-defined behavior). How is this new construct different from the existing kinds of behavior, and why has it been added to the C++ standard?


Solution

  • Erroneous behavior is "buggy" or "incorrect" behavior, as explained by P2795: Erroneous behavior for uninitialized reads. This proposal introduced erroneous behavior into C++26, turning previously undefined behavior into erroneous behavior.

    The most notable difference is that undefined behavior has no limit as to what the program may do, including jumping into "random" functions, accessing memory that shouldn't be accessed, and other effects which are detrimental to security. Erroneous behavior is formally ([defns.erroneous]):

    well-defined behavior that the implementation is recommended to diagnose

    It may be diagnosed through warnings, run-time errors, etc.; formally, [intro.abstract] p5, sentence 3 explains:

    If the execution contains an operation specified as having erroneous behavior, the implementation is permitted to issue a diagnostic and is permitted to terminate the execution at an unspecified time after that operation.

    Motivation

    Unfortunately, a substantial amount of C++ code is not bug-free, and many bugs can be harmful to security. An obvious example is this:

    void (*f)(); // uninitialized function pointer;
                 // basically an abstraction for an instruction address
    // ...
    f();         // what address do we jump to?
    

    If f takes up some space on the stack, an attacker could make sure that the memory on the stack prior to executing this code has a value of their choice. f() would thus allow an attacker to jump to any instruction in the program they want. There are many more such cases of CWE-457: Use of Uninitialized Variable.

    Simply making this code "correct" by initializing the function pointer to nullptr by default also wouldn't make sense, since there is clearly a bug here. f should have been initialized and the compiler should bring our attention to that fact if we forgot to initialize it before calling f, storing it somewhere, etc. We wouldn't want this bug to simply be "swept under the rug".

    How does erroneous behavior work?

    Erroneous behavior starts with an erroneous value, which e.g. is produced when leaving a variable uninitialized. On a side note, the pre-C++26 behavior can be reproduced using the [[indeterminate]] attribute:

    void f(int);
    
    int indet [[indeterminate]]; // indet has indeterminate value
    int erron;                   // erron has erroneous value ([basic.indet])
    
    f(indet); // undefined behavior
    f(erron); // erroneous behavior
    

    As explained above, undefined behavior could do anything here, including jumping into a function other than f, whereas f(erron) should always have defined behavior, but should be diagnosed at some point.

    Erroneous vs. ill-formed

    Erroneous behavior may seem similar to ill-formed programs, since both should result in a diagnostic (see also [intro.compliance.general] p8).

    However, erroneous behavior comes into effect during program execution, whereas a program is ill-formed during translation (compilation). For example:

    int x = float; // ill-formed; not valid C++ code,
                   // shall be diagnosed
    
    int y;         // well-formed (valid C++ code) but y has erroneous value
    int z = y;     // erroneous behavior, should be diagnosed
    

    Erroneous behavior in constant expressions

    Unlike undefined behavior, erroneous behavior always disqualifies expressions from being constant expressions ([expr.const] p5.8). Note that undefined behavior behaves the same in most cases, but e.g. failed Preconditions in most of the standard library or failed [[assume]] attributes can still result in UB inside of constant expressions.

    Also, a constexpr object cannot have erroneous value:

    constexpr int x; // error: x has erroneous value
    

    In C++23, this would have also been ill-formed because x would have had indeterminate value.

    The broader picture

    In general, C++ developers and the C++ committee are pushing the language into a "safer" direction. As part of that, a large amount of undefined behavior could be turned into erroneous behavior over the coming years.

    In some cases, there is already a highly motivated proposal for it, such as P2973: Erroneous behavior for missing return from assignment. Some other cases of undefined behavior like signed integer overflow, division by zero, etc. could be made erroneous.

    Harder-to-diagnose forms of UB such such as data races or invalid down-casts (with static_cast) will likely remain undefined, perhaps indefinitely.

    The cost of erroneous behavior

    Compilers increasingly rely on undefined behavior for the purpose of optimizations. For example:

    void f(int i) {
        int arr[1] { 123 };
        return arr[i];
    }
    

    The compiler can optimize this down to:

    void f(int):
        mov eax, 123
        ret
    

    If i was anything other than 0, the array arr would be accessed out of bounds, which is undefined behavior. The compiler is allowed to assume that UB simply doesn't happen and optimize accordingly. If accessing arrays out of bounds was turned into erroneous behavior, the compiler would be encouraged to add a run-time bounds check to the array access and terminate the program if i is not 0.

    In conclusion, erroneous behavior isn't "free"; it comes at a performance cost. Erroneous behavior is typically added to the C++ standard where the security risk of undefined behavior is significant, and where the cause of undefined behavior is not commonly used for optimizations.