Search code examples
cparsingstructlanguage-lawyergrammar

How does C's grammar allow this unnamed nested struct?


The K&R and Microsoft grammars for C indicate this simple struct:

struct { int a ; } ;

matches a declaration while parsing like this:

declaration
 declarationSpecifier ';'
  typeSpecifier ';'
   structOrUnionSpecifier ';'
    structOrUnion '{' structDeclarationList '};'
     'struct {' structDeclaration '};'
      'struct {' specifierQualifierList structDeclaratorList ';};'
       'struct {' typeSpecifier structDeclarator ';};'
        'struct { int' declarator ';};'
         'struct { int' directDeclarator ';};'
          'struct { int' identifier ';};'
           'struct { int a ;};'

and is therefore legal.  But if we nest it inside another struct like

struct { struct { int a ; } ; } ;

the parsing becomes:

declaration
 declarationSpecifier ';'
  typeSpecifier ';'
   structOrUnionSpecifier ';'
    structOrUnion '{' structDeclarationList '};'
     'struct {' structDeclaration '};'
      'struct {' specifierQualifierList structDeclaratorList ';};'
       'struct {' typeSpecifier structDeclarator ';};'
        'struct {' structOrUnionSpecifier declarator ';};'
         'struct {' structOrUnion '{' structDeclarationList '}' directDeclarator ';};'
          'struct { struct {' structDeclaration '}' identifier ';};'

which at this point is forcing an identifier where we don't want one as in struct{struct{int a;}b;}; yet if we ignore that rule and leave out the b then struct{struct{int a;};}; will still compile with no problem.

How and why is struct{struct{int a;};}; compilable when the grammar indicates an additional identifier is required? Have I misinterpreted the grammar or is the actual grammar not what I think it is?

EDIT:

"Unparsing" the desired struct{struct{int a;};}; from the desired bottom up gives:

'struct { struct { int a ;};};'
'struct { struct { int' identifier ';};};'
'struct { struct { int' directDeclarator ';};};'
'struct { struct { int' declarator ';};};'
'struct { struct {' typeSpecifier structDeclarator ';};};'
'struct { struct {' specifierQualifierList structDeclaratorList ';};};'
'struct { struct {' structDeclaration  '};};'
'struct {' structOrUnion '{' structDeclarationList '};};'
'struct {' structOrUnionSpecifier ';};'
'struct {' typeSpecifier ';};'
'struct {' declarationSpecifier ';};'
'struct {' declaration '};'

but we're stuck again because a struct{ must be followed by a structDeclaration not a declaration.


Solution

  • How does C's grammar allow this unnamed nested struct?

    The K&R C grammar does not allow this unnamed, anonymous nested structure.

    This has changed in C11.

    How and why is struct{struct{int a;};}; compilable when the grammar indicates an additional identifier is required?

    But, why not? Compilers have extensions. Technically, the C standard specifies what should happen with valid programs. The behavior of compilers is loose when compiling an invalid program (i.e. undefined behavior).

    Have I misinterpreted the grammar

    No, it does't look like it.

    or is the actual grammar not what I think it is?

    I compare with https://port70.net/~nsz/c/c89/c89-draft.txt "3.5 DECLARATIONS". It looks like your interpretation is fine.

    The whole point of the question is how unnamed nested structs are compiling

    It's an extension in MSVC that allows for unnamed anonymous structures. https://learn.microsoft.com/en-us/cpp/build/reference/microsoft-extensions-to-c-and-cpp?view=msvc-170 . It is quite confusing.