Search code examples
parsingcompiler-constructiongrammarinterpreteryacc

How to replace macros with a grammar-based parser?


I need a parser for an exotic programming language. I wrote a grammar for it and used a parser generator (PEGjs) to generate the parser. That works perfectly... except for one thing: macros (that replace a placeholder with predefined text). I don't know how to integrate this into a grammar. Let me illustrate the problem:

An example program to be parsed typically looks like this:

instructionA parameter1, parameter2
instructionB parameter1
instructionC parameter1, parameter2, parameter3

No problem so far. But the language also supports macros:

Define MacroX { foo, bar }
instructionD parameter1, MacroX, parameter4

Define MacroY(macroParameter1, macroParameter2) {
  instructionE parameter1, macroParameter1
  instructionF macroParameter2, MacroX
}

instructionG parameter1, MacroX
MacroY

Of course I could define a grammar to identify Macros and references to Macros. But in that case I don't know how I would parse the contents of a Macro, because it's not clear what the macro contains. It could be just one parameter (that's easiest), but it could also be several parameters in one macro (like MacroX in my example, which represents two parameters) or a whole block of instructions (like MacroY). And Macros can even contain other Macros. How do I put this into a grammar if it's not clear what the macro is semantically?

The easiest approach seems to be to run a preprocessor first to replace all the macros and only then run the parser. But in that case the line numbers get messed up. I want the parser to generate error messages containing the line number if there is a parse error. And if I preprocess the input, the line numbers do not correspond anymore.

Help very much appreciated.


Solution

  • Macro processors tend not to respect the boundaries of language elements; in essence, they (often) can make arbitrary changes to the apparant input string.

    If this is the case, you have little choice: you'll need to build a macro processor, that can preserve the line numbers.

    If the macros always contain well-structured language elements, and they always occur in structured places in the code, then you can add the notion of a macro definition and call to your grammar. This may make your parses ambiguous; foo(x) in C code might be macro call, or it might be a function call. You'll have to resolve that ambiguity somehow. C parsers used to solve such ambiguity problems by collecting symbol table information as they parsed; if you collect is-foo-a-macro as you parse, then you can determine that foo(x) is a macro call or not.