Search code examples
parsingboostboost-spiritboost-spirit-qi

How to parse reserved words correctly in boost spirit


I'm trying to parse a sequence of the syntax: < direction > < type > < name >. For example:

in float foo

where the direction can be either in, out, or in_out. I've succeeded in parsing correct text by using a qi::symbols class to convert the direction keywords to an enum.

However, the problem shows when I don't have correct text. Take the example:

int foo

The symbol table parser will except the 'in' part of the 'int' type and so the results will be:

direction: in
type: t
name: foo

And the error is not detected. What's the best way to be able to parse the in, out and in_out reserved words and ensure that they are followed by a non-identifier character so that the 'int' part of the previous text fails?

Thanks


Solution

  • In addition to the "manual" approach suggested by Mike you can

    1. use a convenience wrapper rule
    2. use the distinct parser direetive from the Spirit Repository

    1. Use a convenience wrapper

    I just remembered, I once came up with this quick and dirty helper:

    static const qi::rule<It, qi::unused_type(const char*)> kw 
          = qi::lit(qi::_r1) >> !qi::alnum;
    

    Which you could use like (using +"lit" to decay the array-ref into const char*):

    stmt = 
             kw(+"if") >> '(' >> expr >> ')' >> block
         >> -(kw(+"else") >> block)
         ;
    

    You can make it considerably more convenient

    template <std::size_t N>
    static auto kw(char const (&keyword)[N]) -> qi::rule<Iterator> {
        // qi::lit has problems with char arrays, use pointer instead.
        return qi::lit(+keyword) >> !qi::alnum;
    }
    

    So you can

    kw_if   = kw("if");
    kw_then = kw("then");
    kw_else = kw("else");
    kw_and  = kw("and");
    kw_or   = kw("or");
    kw_not  = kw("not");
    

    2. Use the distinct directive from the Spirit Repository

    In addition to the "manual" approach suggested by Mike you can use the distinct parser directive from the Spirit Repository:

    int main()
    {
        using namespace spirit_test;
        using namespace boost::spirit;
    
        {
            using namespace boost::spirit::ascii;
    
            qi::rule<char const*, space_type> r;
            r = distinct::keyword["description"] >> -lit(':') >> distinct::keyword["ident"];
    
            BOOST_TEST(test("description ident", r, space));
            BOOST_TEST(test("description:ident", r, space));
            BOOST_TEST(test("description: ident", r, space));
            BOOST_TEST(!test("descriptionident", r, space));
        }
    
        return boost::report_errors();
    }