Search code examples
c++parsingparser-generator

parser generator that generates stand-alone C++ code


Is there a LALR parser generator that produces stand-alone C++ code? I am hoping that it would generate two files named something like "Parser.cpp" and "Parser.hpp," and the generated parser is implemented in a single class (that I can wrap in whatever namespace) that I can use for my parsing needs.

I want to use it for fun (i.e. small personal projects), and I'd like the output to be stand-alone (without any headers) so that I know I can compile it wherever I have a C++ compiler.

The search so far:

I've looked at flex/bison, but AFAIK they both require special headers and libraries. I've also looked at ANTLR a little bit, but it is not obvious to me that it can generate stand-alone C++ code. If someone can confirm that it can, then I might look more into it.


Solution

  • GOLD Parser (Bart Kiers mentioned the list on Wikipedia) has support for C and C++ languages. It does not generate a completely self-contained C/C++ source code file. All it does is the generation of Lexer/Parser tables which can be consumed by the "parsing engine".

    To accomplish your task (or something similar) I did the following:

    1. Prepare your LALR grammar in Gold's format

    2. Generate parsing tables (one binary file)

    3. Use an old trick to convert the binary file into a header file like

      unsigned char ParseTable[] = { ... };

    4. Modify the loader from the "parsing engine" sources (or use the C version which supports in-memory loading, as I remember)

    5. Combine the sources for the GPEngine (if it is a C++ version) into the .h/.cpp pair.

    6. Append the ParseTable to .cpp

    Sure, it's not that straightforward, but all the steps can in principle be done within a single "combine" script which can be used with a number of grammars.

    I guess the major drawback is the fact that GOLD is closed-source and windows-only (it means that to produce the parsing tables you have to use Windows machine).