Search code examples
parsingcompilationrecursive-descentcompiler-construction

Generate output along with recursive descent parser


Writing simple parsers are trivial and I have implemented several over the years. In college, we also had to write one. But we never had to generate meaningful output using this method; we never learned how to create a back end.

If I have a working recursive descent parser for a simplified Pascal and I wanted to translate the code to C++, how would I go about doing this? I dont see a need for an intermediate step such as generating an abstract syntax tree.

So how do I go about outputting the compiled or translated code? The only useful example I have found of this is in Jack Crenshaw's tutorial but it focuses more on the front end, as does most other resources. The relationship between my parser code and my grammar is very obvious. What about the relationship between the parsers methods and the outputting? My parser method for a function declaration can relate to exactly one EmitLn(C++ code here) call. But what about parser methods that aren't that easy, such as expressions. Expressions are broken down to possibly many more calls, so that alludes to the need of a Emit() function that allows me to break down the output code for an expression piece by piece. Is there any boiler plate code for outputting code, such as the EmitLn function in Jack Crenshaw's Lets Build a Compiler? This also shows me I need to maintain a basic symbol table, another thing that is often omitted in most examples.

Am I right? What else should I know? Any tips/suggestions, or resources? My big question is, there are so many tutorials out there for compiler front ends, but how about some explanation on the back end? I can parse languages, now I want to evolve that into being able to translate and compile them into other languages.


Solution

  • There are trivial compilers using "on the fly" code generators that spit code as they parse, as you seem to be trying to do. The good news is that they look easy.

    The problem is that compilation is fundamentally about spreading information from one part of the code to another, to enable good code generation. For that, compiler people long ago decided to separate parsing from code generation. The parser parses the code and builds another intermediate data structure (often an abstract syntax tree or triples) that represents the program.

    Often, very deep analysis is then lavished on the intermediate structure to determine how to generate good code, followed by appropriate good code generation. The techniques for doing this well are complex, but well motivated by the need to produce good (and correct) code.

    Try checking out the standard compiler resources. Learning to write a compiler

    It is well worth your time to read some of these references in detail instead of hacking. If you insist on just one, Aho and Ullman is the classic book.

    If you want to see the how on the fly code generation can be made to work reasonably well, check out metacompilers.