Search code examples
c++boostboost-spirit-qi

How to parse a list with an optional separator at the end?


I'm writing a parser that processes a simple C header file with enums and structures. I have written a parser using Boost Spirit Qi, that almost does the task. I'm stuck with a problem that I can fix by a hack, but I'm curious if it's possible to solve it more accurately.

Enum's I'm dealing with are simple. Here is an example:

enum <optional enum name>
{
   VALUE1,
   VALUE2 = 222,
   VALUE3
}

The code snippet that parses such enums:

IdParser %= lexeme[(alpha | '_') >> *(alnum | '_')];
EnumExprParser %= lexeme[+(char_ - (lit(",") | lit("}")))];
EnumValueParser %= IdParser >> -('=' >> EnumExprParser);
EnumParser %= lit("enum") >> -IdParser >> lit("{") >> (EnumValueParser % lit(",")) >> lit("}") >> -lit(";");

Notice, that I parse enum values as a list separated by a comma. But sometimes the last enum value is ended by a comma too : VALUE3,. My dirty solution is the following: *(EnumValueParser >> -lit(","))

But this allows parsing several enum values without a separator. This is acceptable for me, but I'm interested in more clean solution. I'm parsing enums into the following structures:

struct EnumValue
{
    std::string Name;
    boost::optional<std::string> Value;
};

struct Enum
{
    boost::optional<std::string> Name;
    std::vector<EnumValue> Values;
};

Many thanks in advance!


Solution

  • One quickfix would be to replace

    EnumBody  = '{' >> EnumValue % "," >> '}';
    

    With

    EnumBody  = '{' >> -EnumValue % "," >> '}';
    

    Though that's sloppy, because it would allow enum X { a,,,b } as well. So, this would be more accurate:

    EnumBody  = '{' >> EnumValue % "," >> -lit(',') >> '}';
    

    NOTE There's another catch you haven't spotted yet, and that's the empty enum body should be allowed too (enum X {}), so let's fix that too:

    EnumBody  = '{' >> -(EnumValue % ",") >> -lit(',') >> '}';
    

    Demo

    Live On Coliru

    #include <boost/fusion/adapted/std_pair.hpp>
    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iostream>
    
    namespace qi = boost::spirit::qi;
    
    namespace ast {
        using Id = std::string;
        using EnumEntry = std::pair<Id, std::string>;
        using EnumBody = std::vector<EnumEntry>;
    
        struct EnumDef {
            Id name;
            EnumBody members;
        };
    }
    
    BOOST_FUSION_ADAPT_STRUCT(ast::EnumDef, name, members)
    
    template <typename It, typename Skipper = qi::space_type>
    struct Parser : qi::grammar<It, ast::EnumDef(), Skipper> {
        Parser() : Parser::base_type(Enum) {
            using namespace qi;
    
            Id        = raw [(alpha | '_') >> *(alnum | '_')];
            EnumExpr  = +~char_(",}");
            EnumValue = Id >> -('=' >> EnumExpr);
            EnumBody  = '{' >> -(EnumValue % ",") >> -lit(',') >> '}';
            Enum      = "enum" >> -Id >> EnumBody >> -lit(';');
        }
      private:
        qi::rule<It, ast::EnumEntry(), Skipper> EnumValue;
        qi::rule<It, ast::EnumBody(),  Skipper> EnumBody;
        qi::rule<It, ast::EnumDef(),   Skipper> Enum;
        // lexemes:
        qi::rule<It, ast::Id()> Id, EnumExpr;
    };
    
    int main() {
        using It = boost::spirit::istream_iterator;
        It f(std::cin >> std::noskipws), l;
    
        bool ok = qi::phrase_parse(f, l, Parser<It>(), qi::space);
    
        if (ok) {
            std::cout << "Parse success\n";
        } else {
            std::cout << "Parse failed\n";
        }
    
        if (f != l)
            std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
    }
    

    For the input

    enum NAME
    {
        VALUE1,
        VALUE2 = 222,
        VALUE3,
    }
    

    Prints

    Parse success