Search code examples
c++parsingboostboost-spiritparser-generator

What are the disadvantages of the Spirit parser-generator framework from boost.org?


In several questions I've seen recommendations for the Spirit parser-generator framework from boost.org, but then in the comments there is grumbling from people using Spirit who are not happy. Will those people please stand forth and explain to the rest of us what are the drawbacks or downsides to using Spirit?


Solution

  • It is a quite cool idea, and I liked it; it was especially useful to really learn how to use C++ templates.

    But their documentation recommends the usage of spirit for small to medium-size parsers. A parser for a full language would take ages to compile. I will list three reasons.

    • Scannerless parsing. While it's quite simpler, when backtracking is required it may slow down the parser. It's optional though - a lexer might be integrated, see the C preprocessor built with Spirit. A grammar of ~300 lines (including both .h and .cpp files) compiles (unoptimized) to a file of 6M with GCC. Inlining and maximum optimizations gets that down to ~1,7M.

    • Slow parsing - there is no static checking of the grammar, neither to hint about excessive lookahead required, nor to verify basic errors, such as for instance usage of left recursion (which leads to infinite recursion in recursive-descent parsers LL grammars). Left recursion is not a really hard bug to track down, though, but excessive lookahead might cause exponential parsing times.

    • Heavy template usage - while this has certain advantages, this impacts compilation times and code size. Additionally, the grammar definition must normally be visible to all other users, impacting even more compilation times. I've been able to move grammars to .cpp files by adding explicit template instantiations with the right parameters, but it was not easy.

    UPDATE: my response is limited to my experience with Spirit classic, not Spirit V2. I would still expect Spirit to be heavily template-based, but now I'm just guessing.