Search code examples
arrayscmultidimensional-arraylanguage-lawyermemory-layout

Is a two-dimensional array implemented as a continuous one-dimensional array?


I have a question about the memory layout of a two-dimensional array. When we define one, just like int a[3][4], is the memory allocated to this array continuous?

Or in other words, is a two-dimensional array implemented as a continuous one-dimensional array?

If the answer is yes, is accessing a[0][6] equivalent to accessing a[1][2]?

I wrote the following C program.

#include <stdio.h>
int main(){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    printf("%d %d\n", a[0][6], a[1][2]);
    return 0;
}

I find that the output is 7 7.

a[0][6] seems illegal, but it points to a[1][2], I want to know why and is such operation legal?


Solution

  • This is as interesting case. Section 6.2.5p20 of the C standard defines an array as follows:

    An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T , the array type is sometimes called ‘‘array of T ’’. The construction of an array type from an element type is called ‘‘array type derivation’’

    So an array is a contiguous set of objects of a particular type. In the case of int a[3][4] it is an array of size 3 whose objects are also arrays. The type of the subarray is int [4], i.e. an array of size 4 of type int.

    This means that a 2D array, or more accurately an array of arrays, does indeed have all of the individual members of the inner array laid out continuously.

    This does not however mean that the above array can be accessed as a[0][6] as in your example.

    Section 6.5.2.1p2 regarding the array subscript operator [] states:

    The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

    And section 6.5.6p8 regarding the + operator when applied to a pointer operand states:

    When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

    There's a lot to ingest here, but the important part is that given an array of size X, the valid array indices range from 0 to X-1, and attempting to use any other index triggers undefined behavior. In particular, since a[0] has type int [4], attempting to access a[0][6] is going outside the bounds of the array a[0].

    So while a[0][6] in practice would probably work, the C standard makes no guarantee that it will. And given that modern optimizing compilers will aggressively assume undefined behavior doesn't exist in a program and optimize based on that fact, you could end up in a situation where something goes wrong and you don't know why.

    To summarize: yes 2D arrays are implemented that way, and no you can't access them like that.