Search code examples
c++boost-spiritboost-phoenix

Boost::Spirit Expression Parser


I have another problem with my boost::spirit parser.

template<typename Iterator>
struct expression: qi::grammar<Iterator, ast::expression(), ascii::space_type> {
    expression() :
        expression::base_type(expr) {
        number %= lexeme[double_];
        varname %= lexeme[alpha >> *(alnum | '_')];

        binop = (expr >> '+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1,_2)]
              | (expr >> '-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1,_2)]
              | (expr >> '*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1,_2)]
              | (expr >> '/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1,_2)] ;

        expr %= number | varname | binop;
    }

    qi::rule<Iterator, ast::expression(), ascii::space_type> expr;
    qi::rule<Iterator, ast::expression(), ascii::space_type> binop;
    qi::rule<Iterator, std::string(), ascii::space_type> varname;
    qi::rule<Iterator, double(), ascii::space_type> number;
};

This was my parser. It parsed "3.1415" and "var" just fine, but when I tried to parse "1+2" it tells me parse failed. I've then tried to change the binop rule to

    binop = expr >>
           (('+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1, _2)]
          | ('-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1, _2)]
          | ('*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1, _2)]
          | ('/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1, _2)]);

But now it's of course not able to build the AST, because _1 and _2 are set differently. I have only seen something like _r1 mentioned, but as a boost-Newbie I am not quite able to understand how boost::phoenix and boost::spirit interact.

How to solve this?


Solution

  • It isn't entirely clear to me what you are trying to achieve. Most importantly, are you not worried about operator associativity? I'll just show simple answers based on using right-recursion - this leads to left-associative operators being parsed.

    The straight answer to your visible question would be to juggle a fusion::vector2<char, ast::expression> - which isn't really any fun, especially in Phoenix lambda semantic actions. (I'll show below, what that looks like).

    Meanwhile I think you should read up on the Spirit docs

    • here in the old Spirit docs (eliminating left recursion); Though the syntax no longer applies, Spirit still generates LL recursive descent parsers, so the concept behind left-recursion still applies. The code below shows this applied to Spirit Qi
    • here: the Qi examples contain three calculator samples, which should give you a hint on why operator associativity matters, and how you would express a grammar that captures the associativity of binary operators. Obviously, it also shows how to support parenthesized expressions to override the default evaluation order.

    Code:

    I have three version of code that works, parsing input like:

    std::string input("1/2+3-4*5");
    

    into an ast::expression grouped like (using BOOST_SPIRIT_DEBUG):

    <expr>
      ....
      <success></success>
      <attributes>[[1, [2, [3, [4, 5]]]]]</attributes>
    </expr>
    

    The links to the code are here:

    Step 1: Reduce semantic actions

    First thing, I'd get rid of the alternative parse expressions per operator; this leads to excessive backtracking1. Also, as you've found out, it makes the grammar hard to maintain. So, here is a simpler variation that uses a function for the semantic action:

    1check that using BOOST_SPIRIT_DEBUG!

    static ast::expression make_binop(char discriminant, 
         const ast::expression& left, const ast::expression& right)
    {
        switch(discriminant)
        {
            case '+': return ast::binary_op<ast::add>(left, right);
            case '-': return ast::binary_op<ast::sub>(left, right);
            case '/': return ast::binary_op<ast::div>(left, right);
            case '*': return ast::binary_op<ast::mul>(left, right);
        }
        throw std::runtime_error("unreachable in make_binop");
    }
    
    // rules:
    number %= lexeme[double_];
    varname %= lexeme[alpha >> *(alnum | '_')];
    
    simple = varname | number;
    binop = (simple >> char_("-+*/") >> expr) 
        [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ]; 
    
    expr = binop | simple;
    

    Step 2: Remove redundant rules, use _val

    As you can see, this has the potential to reduce complexity. It is only a small step now, to remove the binop intermediate (which has become quite redundant):

    number %= lexeme[double_];
    varname %= lexeme[alpha >> *(alnum | '_')];
    
    simple = varname | number;
    expr = simple [ _val = _1 ] 
        > *(char_("-+*/") > expr) 
                [ _val = phx::bind(make_binop, qi::_1, _val, qi::_2) ]
        > eoi;
    

    As you can see,

    • within the expr rule, the _val lazy placeholder is used as a pseudo-local variable that accumulates the binops. Across rules, you'd have to use qi::locals<ast::expression> for such an approach. (This was your question regarding _r1).
    • there are now explicit expectation points, making the grammar more robust
    • the expr rule no longer needs to be an auto-rule (expr = instead of expr %=)

    Step 0: Wrestle fusion types directly

    Finally, for fun and gory, let me show how you could have handled your suggested code, along with the shifting bindings of _1, _2 etc.:

    static ast::expression make_binop(
            const ast::expression& left, 
            const boost::fusion::vector2<char, ast::expression>& op_right)
    {
        switch(boost::fusion::get<0>(op_right))
        {
            case '+': return ast::binary_op<ast::add>(left, boost::fusion::get<1>(op_right));
            case '-': return ast::binary_op<ast::sub>(left, boost::fusion::get<1>(op_right));
            case '/': return ast::binary_op<ast::div>(left, boost::fusion::get<1>(op_right));
            case '*': return ast::binary_op<ast::mul>(left, boost::fusion::get<1>(op_right));
        }
        throw std::runtime_error("unreachable in make_op");
    }
    
    // rules:
    expression::base_type(expr) {
    number %= lexeme[double_];
    varname %= lexeme[alpha >> *(alnum | '_')];
    
    simple = varname | number;
    binop %= (simple >> (char_("-+*/") > expr)) 
        [ _val = phx::bind(make_binop, qi::_1, qi::_2) ]; // note _2!!!
    
    expr %= binop | simple;
    

    As you can see, not nearly as much fun writing the make_binop function that way!