I'm trying to clarify my understanding of what undefined behavior is in a more formal, rigorous manner, especially in the context of the C++ standard.
The following quote to me from https://en.cppreference.com/w/cpp/language/ub, seems incorrect:
Because correct C++ programs are free of undefined behavior, compilers may produce unexpected results when a program that actually has UB is compiled with optimization enabled:
To me it does not generally make sense to say that a program itself is/is not free of undefined behavior - it only ever makes sense to me to talk about undefined behavior in the context of a particular execution of the program. For the same program, execution with one input might end up using some operation that is not properly defined (and thus the execution of the program has undefined behavior), for some other input it might not. Sometimes it can be determined that the program execution will/will not produce undefined behavior for every input, but that's generally not the case.
It also seems to be in conflict with the following from the standard: https://eel.is/c++draft/intro.compliance#intro.abstract-5
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
Is the cppreference quote wrong, or is there some sense in which a "program" itself, not its execution, can have undefined behavior?
Undefined behavior is a property of a program. Generally:
3.65 undefined behavior [defns.undefined]
behavior for which this document imposes no requirements
[- Note 1: Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [...] - end note]
When the standard refers to a program having undefined behavior, it refers to the program generated by the compiler. Undefined behavior can be handled during translation (i.e. compilation) too, as said in the quote above.
For example, a division by zero is encoded in that program, and that division by zero has an undefined effect, once it is executed. When handled during translation, you can tell whether a program contains undefined behavior, even without executing it:
int main() {
return 1 / 0;
}
GCC with -O2
compiles this to:
main:
ud2
... with the diagnostic message:
warning: division by zero [-Wdiv-by-zero]
2 | return 1 / 0;
| ~~^~~
ud2
Generates an invalid opcode exception. This instruction is provided for software testing to explicitly generate an invalid opcode exception. The opcodes for this instruction are reserved for this purpose.
- https://www.felixcloutier.com/x86/ud
However, there are many cases where undefined behavior takes place only for some program input, and you wouldn't see ud2
in the assembly output.
In such a case, the program has undefined behavior for said input, but the undefined behavior only takes effect if you execute it with said input.