Search code examples
c++parsingantlrantlr4abstract-syntax-tree

ANTLR4: Re-visiting parse rules after the whole ast is visited


I am currently implementing generic functions for my own language, but I got stuck and currently have the following problem:

Generic functions can get called from another source file (another parser instance). Let's assume we have a generic function in source file B and we call it from source file A, which imports source file B. When this happens, I need to type-check the body of the function (source file B) once again for every distinct manifestation of concrete types, derived from the function call (source file A). For that, I need to visit the body of the function in source file B potentially multiple times.

Source file B:

type T dyn;

public p printFormat<T>(T element) {
    printf("Test");
}

Source file A:

import "source-b" as b;

f<int> main() {
    b.printFormat<double>(1.123);
    b.printFormat<int>(543);
    b.printFormat<string[]>({"Hello", "World"});
}

I tried to realize that approach by putting the code for analyzing the function body and its children in an inner function and call it every time I encounter a call to that particular function from anywhere (also from other source files). This seems not to work for some reason. I always get a segmentation fault. Maybe this is because the whole tree was already visited once?

For additional context: C++ source code of my visitor

Would appreciate some useful answers or tips, thank you! ;)


Solution

  • I don't think the best approach is to hack around with parsers. Parsers should turn one array of characters into one AST.

    In your case, you've got a fairly complex but new language, using multiple files. When you import B, you really want to import the AST. C++ historically messed with a literal #include and the parsing problems that brings, and only now is getting modules. Languages like Java did away with this textual inclusion, but retrofitted generics later on. You've got a clean slate. You should design your language such that the compiler can just take a bunch of AST's as its input.

    Since the compiler will take AST's as input, each AST will be read-only. You can of course have a cache for instantiations so you don't need to re-instantiate printFormat<int> every time you encounter it in an AST, but that's a detail.

    What's not an detail is how instantiation should work in your language. A common mistake is the assumption that C++ templates work like macro's, at text level. That's not the case; they work at the language level. Yours should work also at the language level. It would be really convenient for you if instantiation took an AST (or at least a subtree thereof) and would then produce a new AST for the instantiation, again read-only. It's no coincidence that the C++ template meta-language is effectively a functional language. These kinds of problems become much easier the more you can make read-only.