Search code examples
c++multidimensional-arraypaddingmemory-alignment

Are plain nested array of arrays guaranteed to be contiguous?


I assume that nested std::array cannot assume that memory is strictly contiguous: see for instance: Is the data in nested std::arrays guaranteed to be contiguous? and other related questions. There might be some padding at the end of each sub-array. I cannot find the equivalent information about plain array. Is there a guarantee that all data within a plain array are contiguous in memory or not? for instance with:

uint8_t A[3][7];

Is it safe to say that A[1]-A[0] is 7?
subsidiary question: what should be the behaviour of the sizeof operator on a multidimensional array and on one of its subarray?

NB ISO/IEC JTC1 SC22 WG21 N4860 §9.3.3.4-9 states:

[Note: When several “array of” specifications are adjacent, a multidimensional array type is created; only the first of the constant expressions that specify the bounds of the arrays may be omitted. [Example: int x3d[3][5][7]; declares an array of three elements, each of which is an array of five elements, each of which is an array of seven integers. The overall array can be viewed as a three-dimensional array of integers, with rank 3 × 5 × 7. Any of the expressions x3d, x3d[i], x3d[i][j], x3d[i][j][k] can reasonably appear in an expression. The expression x3d[i] is equivalent to *(x3d + i); in that expression, x3d is subject to the array-to-pointer conversion (7.3.2) and is first converted to a pointer to a 2-dimensional array with rank 5 × 7 that points to the first element of x3d. Then i is added, which on typical implementations involves multiplying i by the length of the object to which the pointer points, which is sizeof(int)×5 × 7. The result of the addition and indirection is an lvalue denoting the ith array element of x3d (an array of five arrays of seven integers). If there is another subscript, the same argument applies again, so x3d[i][j] is an lvalue denoting the jth array element of the ith array element of x3d (an array of seven integers), and x3d[i][j][k] is an lvalue denoting the kth array element of the jth array element of the ith array element of x3d (an integer). —end example] The first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. —end note]

If I understand it correctly, the data should be contiguous but the "on typical implementations" part bothers me. Thus I still need confirmation. (In which case, I fail to understand why std::array would not have the same guarantee).

[EDIT] "contiguous" notion happens to be misleading in the case of nested sequence and I think that existing post on stackoverflow are not completly clear about that.
Thus my actual questions would be:

  1. is the memory size only the sum of the size of all elements (no padding), thus here sizeof(A) would be sizeof(std::uint8_t*3*7)?
  2. is A a contiguous sequence of std::uint8_t meaning that:
void foo(std::uint8_t);
std::uint8_t* p = &A([0][0]);
for (size_t i = 0;i < 3*7;++i,++p) {
   foo(*p);
}

are valid?
3. is an expression like std::memset(A,0,sizeof(std::uint8_t)*3*7) valid?


Solution

  • I sum up here the various answers and comments regarding nested arrays such as

    std::uint8_t A[3][7]
    

    Relevant questions would be:

    1. is the memory size only the sum of the size of all elements (no padding), thus here sizeof(A) would be sizeof(std::uint8_t*3*7)?
    2. is A a contiguous sequence of std::uint8_t meaning that:
    void foo(std::uint8_t);
    std::uint8_t* p = &A([0][0]);
    for (size_t i = 0;i < 3*7;++i,++p) {
       foo(*p);
    }
    

    is valid?
    3. is an expression like std::memset(A,0,sizeof(std::uint8_t)*3*7) valid?

    From ISO/IEC JTC1 SC22 WG21 N4860 §7.6.2.4-2, definition of the sizeof operator:

    When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element

    By immediate recursion, it leaves no other possibility than the fact that nested plain arrays size is the total size of its sub-elements (no padding bytes of any kind).
    Yet it does not implies that it is contiguous with respect to the most nested type.

    In order to understand that, I must give my own definition of "contiguous memory" as I failed to find an explicit definition within the standard. Thus my definition would be:
    Saying that the memory locations of a sequence of objects are contiguous means that, from a pointer to the first element of the sequence, successively incrementing the pointer gives successive locations of the sequence objects, up to one past the last element.

    Indeed from ISO/IEC JTC1 SC22 WG21 N4860 §7.6.6:

    When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P...
    (4.2) - Otherwise, if P points to an array element i of an array object x with n elements (9.3.3.4),76 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.
    (4.3) — Otherwise, the behavior is undefined.

    Besides:

    When two pointer expressions P and Q are subtracted,...
    (5.2) — Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j. (5.3) — Otherwise, the behavior is undefined.

    Yet A[x] is not a pointer but a std::uint8_t[7] that can be decayed to a pointer to its first element. Then, despite of the contiguous memory characteristic, A[1] and A[0] then decays to pointer to data in "semantically" different arrays. Thus as (4.3) is stating, it is undefined behavior to try to evaluate A[1]-A[0].
    On the other hand &A[x] is the address of the xth element of A, A[1] and A[0] are elements of the same array A and, so, &A[1]-&A[0] is legal and evaluate to 1 (see 4.2).

    Thus the answer to 2 is no: it is undefined behaviour.

    Besides

    Multidimensional arrays description in ISO/IEC JTC1 SC22 WG21 N4860 §9.3.3.4-9 is only, IMHO, slightly confusing. I just reproduced it here, for the record:

    [Note: When several “array of” specifications are adjacent, a multidimensional array type is created; only the first of the constant expressions that specify the bounds of the arrays may be omitted. [Example: int x3d[3][5][7]; declares an array of three elements, each of which is an array of five elements, each of which is an array of seven integers. The overall array can be viewed as a three-dimensional array of integers, with rank 3 × 5 × 7. Any of the expressions x3d, x3d[i], x3d[i][j], x3d[i][j][k] can reasonably appear in an expression. The expression x3d[i] is equivalent to *(x3d + i); in that expression, x3d is subject to the array-to-pointer conversion (7.3.2) and is first converted to a pointer to a 2-dimensional array with rank 5 × 7 that points to the first element of x3d. Then i is added, which on typical implementations involves multiplying i by the length of the object to which the pointer points, which is sizeof(int)×5 × 7. The result of the addition and indirection is an lvalue denoting the ith array element of x3d (an array of five arrays of seven integers). If there is another subscript, the same argument applies again, so x3d[i][j] is an lvalue denoting the jth array element of the ith array element of x3d (an array of seven integers), and x3d[i][j][k] is an lvalue denoting the kth array element of the jth array element of the ith array element of x3d (an integer). —end example] The first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. —end note]

    Regarding question 3, my temporary answer would be: it is valid, if the most nested type is trivially copyable.
    From cppreference about std::memset, possible requirements are:

    1. std::size_t count argument must be <= to the size in bytes of the array, which is the case.
    2. the object pointed by void* dest is trivially copyable which is the case, by immediate recursion if the most nested type is trivially copyable.