Search code examples
arraysclanguage-lawyerdeclarationlanguage-design

Why must a comma expression used as an array size be enclosed in parentheses if part of an array declarator?


I just noticed that int arr2[(777, 100)] is a legal array declarator, while int arr1[777, 100] is not.

A more verbose example for compiling would be this

#include <stdio.h>

void f(int i) {
    printf("side effect %d\n", i);
}

int main(void) {
    // int arr1[777, 100]; // illegal, it seems
    int arr2[(777, 100)];
    int arr3[(f(777), 100)];

    arr2[10, 20] = 30;
    arr3[f(40), 50] = 60;

    return 0;
}

which compiles fine with GCC (but for some reason not with MSVC).

Also note crucially how the above code illustrates how comma expressions are fine in a non-declarator context.

The reason (for why parentheses are needed in array declarators) appears to be that in the C standard the array size in square brackets is an assignment-expression but not an expression (C17 draft, 6.7.6.2 ¶3 and A.2.1); the latter is the syntactic level for comma operators (C17 draft, A.2.1 (6.5.17)):

expression:
assignment-expression
expression , assignment-expression

If one expands assignment-expression, one ultimately gets to the level of primary-expression (C17 draft, A.2.1 (6.5.1)):

primary-expression:
identifier
constant
string-literal
( expression )
generic-selection

If the standard says so, that's the way it is, but: Is there a syntactic necessity? Perhaps there is a reason based in language design considerations. The C standard lists the following 4 forms of array declarators (C17 draft, 6.7.6.2 ¶3):

D [ type-qualifier-listopt assignment-expressionopt ]
D [ type-qualifier-listopt assignment-expression ]
D [ type-qualifier-list static assignment-expression ]
D [ type-qualifier-listopt ]

  • One potential reason I can see is to keep this syntax simple, with the 3rd line (with static) in mind.
  • Another potential reason might be to prevent people from writing int arr[m, n] when they really mean int arr[m][n].

Incidentally, if anyone has comments about why this doesn't compile with MSVC, that would also be appreciated.


Solution

  • As mentioned in the question, avoiding accepting multiple expressions separated by commas avoids potential mistakes by people accustomed to other programming languages that use that syntax for multiple array dimensions.

    Specifying assignment-expression rather than expression in the grammar excludes only the comma operator. Besides the already conjectured reason above, the only effect I can see is on macro use. int a[3, 4] would be parsed as two arguments to a function-like macro1, whereas int a[(3, 4)] would be one. But without some example use case for a macro involving that, I do not see it as the reason.

    Footnote

    1 For example, Foo(int a[3, 4]) would be parsed as invoking Foo with one argument of int a[3 and another of 4].