Search code examples
c++parsingboostc++11boost-spirit

Can Boost.Spirit be theoretically/practically used to parse C++(0x) (or any other language)?


Is it theoretically up to the task?

Can it be done practically and would the resulting parser be used with sufficient performance and output (say, LLVM IR or GCC's gimple) to be integrated in a competing compiler?


Solution

  • No. C++ is too hard to parse for most automatic tools, and in practice usually is parsed by hand written parsers. [Edit 1-Mar-2015: Added 'most' and 'usually'.]

    Among the hard problems are:

    • A * B; which could be either the definition of a variable B with type A* or just the multiplication of two variables A and B.
    • A < B > C > D Where does the template A<> end? The usual 'max-munch' rules for parsing expressions will not work here.
    • vector<shared_ptr<int>> where the >> ends two templates, which is hard to do with only one token (and a space in between is allowed). But in 1>>15 no space is allowed.

    And I bet that this list is far from complete.

    Addition: The grammar is available, but is ambiguous and thus not valid as input to tools like Spirit.

    Update 1-Mar-2015: As Ira Baxter, a well known expert in this field, points out in the comments, there are some parser generators that can generate a parser that will generate the full parser forest. As far as I know, selecting the right parse still requires a semantic phase. I'm not aware of any non-commercial parser generators that can do so for C++'s grammar. For more information, see this answer.