Search code examples
compiler-construction

Can the compiler do some type inference in the grammer parse stage?


Can the compiler do some type inference in the grammer parse stage?some expressions like this:

1+"a"

I know the compiler can do the type inference like this:

| Add(e1, e2) | Sub(e1, e2) ->
    unify Type.Int (g env e1);
    unify Type.Int (g env e2);
    Type.Int

but I want to know can we write the production in the grammer parse file(any kind of yacc file) carefully,so the error can be caught in the grammer parse stage?

If the compiler can do this,where I can find the example?

If can't do this,the reason or the theory is what?Which book or paper can explain the reason that the parse stage can't do this?

Thanks!


Solution

  • You can write grammar rules which reject some type errors. But there are huge limitations to the possibilities, assuming a context-free grammar [Note 1]:

    1. You can only detect type errors which involve a finite number of predefined types. That might be feasible for a simple calculator, but it's not going to work with a non-toy language.

    2. A context-free grammar cannot establish the type of a variable, unless you require Hungarian notation, a design decision which isn't likely to be universally celebrated.

    So that leaves you, at best, a grammar which divides expressions into a small finite set of primitive types and a catch-all "unknown type" expression. Since pretty well any expression which includes a sub-expression of "unknown type" will itself have unknown type (and therefore type errors cannot be detected), you're going to have to do type inference after the parse for a large number of expressions, basically any expression which is not a compile-time constant. So you're really not saving any code.

    Furthermore, it's very hard to produce really good error messages from a parse failure. So the type errors you can catch in the grammar are likely to trigger incomprehensible error messages (or cost you a lot of work trying to produce understandable error messages). By contrast, it's relatively easy to produce good informative error messages during type inference [Note 2].

    In short, trying to write a grammar which attempts to do type checking:

    1. Complicates the grammar,

    2. Complicates error reporting,

    3. And doesn't catch most type errors, so you still need to do type inference after the parse.


    Notes:

    1. The fundamental restriction of a context-free grammar is that every non-terminal has the same set of possible derivations regardless of where it appears in the parse. That makes it impossible to write a rule which requires two instances of a non-terminal to expand to the same string, or even be strings which are related in some way ("have the same type", "name was previously declared", etc.). That's essentially what "context-free" means.

      If you have a fixed finite number of categories, you can achieve correlation by creating a different non-terminal for each category. But that generally leads to quadratic blow-up of the grammar, and not infrequently to hard-to-correct parser table conflicts.

    2. Unless you're writing a C++ compiler, apparently.