Search code examples
c++clanguage-lawyerundefined-behaviorsequence-points

Differences in C and C++ with sequence points and UB


I used this post Undefined Behavior and Sequence Points to document undefined behavior(UB) in a C program and it was pointed to me that C and C++ have their own divergent rules for this [sequence points]. So what are the differences between C and C++ when it comes to sequence points and related UB? Can’t I use a post about C++ sequences to analyze what is happening in C code?

* Of Course I am not talking about features of C++ not applicable to C.


Solution

  • There are two parts to this question, we can tackle a comparison of sequence points rules without much trouble. This does not get us too far though, C and C++ are different languages which have different standards(the latest C++ standard is almost twice as large as the the latest C standard) and even though C++ uses C as a normative reference it would be incorrect to quote the C++ standard for C and vice versa, regardless how similar certain sections may be. The C++ standard does explicitly reference the C standard but that is for small sections.

    The second part is a comparison of undefined behavior between C and C++, there can be some big differences and enumerating all the differences in undefined behavior may not be possible but we can give some indicative examples.

    Sequence Points

    Since we are talking about sequence points then this is covering pre C++11 and pre C11. The sequence point rules do not differ greatly as far as I can tell between C99 and Pre C++11 draft standards. As we will see in some of the example I give of differing undefined behavior the sequence point rules do not play a part in them.

    The sequence points rules are covered in the closest draft C++ standard to C++03 section 1.9 Program execution which says:

    • There is a sequence point at the completion of evaluation of each full-expression12).
    • When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body.
    • There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function13). Several contexts in C++ cause evaluation of a function call, even though no corresponding function call syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts in which no function call syntax appears. —end example ] The sequence points at function-entry and function-exit (as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the function might be.
    • In the evaluation of each of the expressions

      a && b
      a || b
      a ? b : c
      a , b
      

      using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after the evaluation of the first expression14).

    I will use the sequence point list from the draft C99 standard Annex C which although it is not normative I can find no disagreement with the normative sections it references. It says:

    The following are the sequence points described in 5.1.2.3:

    • The call to a function, after the arguments have been evaluated (6.5.2.2).
    • The end of the first operand of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
    • The end of a full declarator: declarators (6.7.5);
    • The end of a full expression: an initializer (6.7.8); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if or switch) (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the expressions of a for statement (6.8.5.3); the expression in a return statement (6.8.6.4).

    The following entries do not seem to have equivalents in the draft C++ standard but these come from the C standard library which C++ incorporates by reference:

    • Immediately before a library function returns (7.1.4).
    • After the actions associated with each formatted input/output function conversion specifier (7.19.6, 7.24.2).
    • Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call (7.20.5).

    So there is not much of a difference between C and C++ here.

    Undefined Behavior

    When it comes to the typical examples of sequence points and undefined behavior, for example those covered in Section 5 Expression dealing with modifying a variable more than once within a sequence points I can not come up with an example that is undefined in one but not the other. In C99 it says:

    Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)

    and it provides these examples:

    i = ++i + 1;
    a[i++] = i;
    

    and in C++ it says:

    Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.57) Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined

    and provides these examples:

    i = v[i ++]; / / the behavior is undefined
    i = ++ i + 1; / / the behavior is undefined
    

    In C++11 and C11 we do have one major difference which is covered in Assignment operator sequencing in C11 expressions which is the following:

    i = ++i + 1;
    

    This is due to the result of pre-increment being an lvalue in C++11 but not in C11 even though the sequencing rules are the same.

    We do have major difference in areas that have nothing to do with sequence points:

    There are probably plenty more examples but these are ones I have written about before.