Search code examples
c++cmultidimensional-arraylanguage-lawyerundefined-behavior

c/c++ -- Is writing to multi-dimesional array from 0 offset UB?


Kindly examine the code below:

#include "stdio.h"

#define N 2
#define M 2

int main(void)
{
    int two_d[N][M];
    for(size_t i = 0; i < N*M; ++i) {
        two_d[0][i] = i;  // <---- Pay attention to this line!
    }
    for(size_t i = 0; i < N; ++i) {
        for(size_t j = 0; j < M; ++j) {
            printf("%d\n", two_d[i][j]);
        }
    }
    return 0;
}

Please don't be skeptical about this example and be fast in judging it contrived -- the one was found by yours truly in the very much real and quite well know project (that is very famous to be named).

I would appreciate a phone number of a good language lawyer!

  1. On the one hand, the memory is guaranteed to be laid out sequentially, so I'm not accessing anything beyond the object in general;
  2. On the other hand, I'm clearly accessing the memory beyond first 1d array -- and doing it is UB.

Example compiles and runs fine on my machine. Mr. Godbolt shows that both C and C++ compilers do the same thing, and with optimizations both handle it like a doctor.

So, the questions are:

  1. Is this legal in C?
  2. Is this legal in C++?

Standards quotes would be appreciated.


Solution

  • In C++, the meaning of the subscript expression is given in expr.sub:

    With the built-in subscript operator, an expression-list shall be present, consisting of a single assignment-expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type. The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.

    Following up about the rules for + in expr.add:

    When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

    • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value. (4.2)
    • Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i - j of x if 0 <= i - j <= n
    • Otherwise, the behavior is undefined

    Your code snippet invokes undefined behavior.


    In C, the rules are very similar. From 6.5.2.1/2, array subscripting:

    A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

    Then, from 6.5.6/8, additive operators:

    When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

    Just like in C++, it's undefined behavior to go outside the bounds of an array, with no special exemption for "but what if there's another array right next to it".