Search code examples
cundefined-behavior

Is there a rule to spot out UB?


I read this great too broad question, and encountered some UB I didn't know before.

The main reason for UB I see from time to time is changing a variable twice between two sequence points. Things like: x = x++ or z = y++ + ++y;. Reading that changing a variable twice between two sequence points is UB helped me see what was the underlying cause in these cases.

But what about things like bit-shift with negatives? (int x = 8 << -1) Is there a rule that can explain that or should I memorize this as a unique UB possibility?

I looked here and under section Integer Overflows I found bit-shift with negatives was written, but I don't understand why they are related. When int is shifted by too much, an overflow is caused, but IMO shifting by a negative is simply UB and the problem isn't the bits that are "over the edge"...

Also looked here ,but that didn't answer my question:

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

So my questions are:

  1. Specifically, is bit-shift with negatives considered integer overflow and if so, why?
  2. If not, is it a part of a bigger phenomena?
  3. Are there (other) unique cases that can't be grouped under one underlying cause?

Solution

  • Specifically, is bit-shift with negatives considered integer overflow and if so, why?

    It is not, because shifting 0 by any amount will never overflow but it is still undefined behaviour to shift a value of 0 by a negative value. (I am assuming that you could consider it integer overflow if you first re-interpret the shift amount as an unsigned integer, at which point it would be large and certainly beyond the allowed range, and an actual shift by that amount if interpreted as a multiplication-by-power-of-2 would certainly overflow if the shifted value was non-zero).

    In short, a bit-shift by negative yields undefined behaviour because the language standard says that it does.

    If not, is it a part of a bigger phenomena?

    John Regehr gives some broad categories of UB in a blog post. Shift by invalid amounts is in the "other UB" category...

    Are there (other) unique cases that can't be grouped under one underlying cause?

    Yes, see the above post. Among others (these are directly lifted from the blog post):

    • Pointers that do not point into, or just beyond, the same array object are subtracted (6.5.6).
    • An object has its stored value accessed other than by an lvalue of an allowable type (6.5)
    • A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2)

    You could possibly categorise these and the other examples in some way, but it's up to you how you'd want to do that.

    In particular, the last example above (about the source file not ending in a new-line) shows just how arbitrary some of the rules are.