Search code examples
c++grammarcontext-free-grammard

Is D's grammar really context-free?


I've posted this on the D newsgroup some months ago, but for some reason, the answer never really convinced me, so I thought I'd ask it here.


The grammar of D is apparently context-free.

The grammar of C++, however, isn't (even without macros). (Please read this carefully!)

Now granted, I know nothing (officially) about compilers, lexers, and parsers. All I know is from what I've learned on the web.
And here is what (I believe) I have understood regarding context, in not-so-technical lingo:

The grammar of a language is context-free if and only if you can always understand the meaning (though not necessarily the exact behavior) of a given piece of its code without needing to "look" anywhere else.

Or, in even less rigor:

The grammar cannot be context-free if I need I can't tell the type of an expression just by looking at it.

So, for example, C++ fails the context-free test because the meaning of confusing<sizeof(x)>::q < 3 > (2) depends on the value of q.

So far, so good.

Now my question is: Can the same thing be said of D?

In D, hashtables can be created through a Value[Key] declaration, for example

int[string] peoplesAges;   // Maps names to ages

Static arrays can be defined in a similar syntax:

int[3] ages;   // Array of 3 elements

And templates can be used to make them confusing:

template Test1(T...)
{
    alias int[T[0]] Test;
}

template Test2(U...)
{
    alias int[U] Test2;  // LGTM
}

Test1!(5) foo;
Test1!(int) bar;
Test2!(int) baz;  // Guess what? It's invalid code.

This means that I cannot tell the meaning of T[0] or U just by looking at it (i.e. it could be a number, it could be a data type, or it could be a tuple of God-knows-what). I can't even tell if the expression is grammatically valid (since int[U] certainly isn't -- you can't have a hashtable with tuples as keys or values).

Any parsing tree that I attempt to make for Test would fail to make any sense (since it would need to know whether the node contains a data type versus a literal or an identifier) unless it delays the result until the value of T is known (making it context-dependent).

Given this, is D actually context-free, or am I misunderstanding the concept?

Why/why not?


Update:

I just thought I'd comment: It's really interesting to see the answers, since:

  • Some answers claim that C++ and D can't be context-free
  • Some answers claim that C++ and D are both context-free
  • Some answers support the claim that C++ is context-sensitive while D isn't
  • No one has yet claimed that C++ is context-free while D is context-sensitive :-)

I can't tell if I'm learning or getting more confused, but either way, I'm kind of glad I asked this... thanks for taking the time to answer, everyone!


Solution

  • Being context free is first a property of generative grammars. It means that what a non-terminal can generate will not depend on the context in which the non-terminal appears (in non context-free generative grammar, the very notion of "string generated by a given non-terminal" is in general difficult to define). This doesn't prevent the same string of symbols to be generated by two non-terminals (so for the same strings of symbols to appear in two different contexts with a different meaning) and has nothing to do with type checking.

    It is common to extend the context-free definition from grammars to language by stating that a language is context-free if there is at least one context free grammar describing it.

    In practice, no programming language is context-free because things like "a variable must be declared before it is used" can't be checked by a context-free grammar (they can be checked by some other kinds of grammars). This isn't bad, in practice the rules to be checked are divided in two: those you want to check with the grammar and those you check in a semantic pass (and this division also allows for better error reporting and recovery, so you sometimes want to accept more in the grammar than what would be possible in order to give your users better diagnostics).

    What people mean by stating that C++ isn't context-free is that doing this division isn't possible in a convenient way (with convenient including as criteria "follows nearly the official language description" and "my parser generator tool support that kind of division"; allowing the grammar to be ambiguous and the ambiguity to be resolved by the semantic check is an relatively easy way to do the cut for C++ and follow quite will the C++ standard, but it is inconvenient when you are relying on tools which don't allow ambiguous grammars, when you have such tools, it is convenient).

    I don't know enough about D to know if there is or not a convenient cut of the language rules in a context-free grammar with semantic checks, but what you show is far from proving the case there isn't.