Search code examples
cmultidimensional-arrayc99c-standard-library

C99 nested arrays undefined behaviour


In our lecture we have recently taken a look at the c99 standard on pointer equality(6.5.9.6) and applied it to nested arrays. There it states that pointers are only guaranteed to be equal if "one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space".

The professor then explained this is the reason that the array access a[0][19] is technically undefined for a nested array with dimensions 4*5. Is this true? If so, why are negative indices defined then e.g. a[1][-1]?


Solution

  • Neither a[0][19] nor a[1][-1] has behavior defined by the C standard.

    C 2018 6.5.2/1 2 tells us that array subscripting is defined in terms of pointer arithmetic:

    A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

    Thus a[0][19] is identical to *(a[0] + 19) (where some parentheses have been omitted because they are unnecessary), and a[1][-1] is identical to *(a[1] + -1).

    In a[0] + 19, and a[1] + -1, a[0] and a[1] are arrays. In these expressions, they are automatically converted to pointers to their first elements, per C 2018 6.3.2.1 3. So these expressions are equivalent to p + 19 and q + -1, where p and q are the addresses of those first elements, &a[0][0] and a[1][0], respectively.

    C 2018 6.5.6 8 defines pointer arithmetic:

    If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and in-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

    So p + 19 would point to element 19 of a[0] if it existed. But a[0] is an array of 5 elements, so element 19 does not exist, and therefore the behavior of p + 19 is not defined by the standard.

    Similarly, q + -1 would point to element -1 of a[1], but element -1 does not exist, and therefore the behavior of q + -1 is not defined by the standard.

    The fact that these arrays are contained within a larger array, and that we know the memory layout of all elements in this larger array, does not matter. The C standard does not define the behavior in terms of the larger memory layout; it specifies behavior based on the specific array in which pointer arithmetic is being evaluated. A C implementation would be free to make this arithmetic work like simple address arithmetic and to define the behavior if it desired, but it also permitted not to do this. Compiler optimization has become more sophisticated and aggressive over the years, and it may transform these expressions based on the C standard’s rules about specific array arithmetic without regard to the memory layout, and this can cause the expressions to fail (not behave as they would with simple address arithmetic).