Search code examples
c++boost-spiritboost-spirit-qi

Boost Spirit Qi Grammar for Synthesizing associative Binary Operator AST nodes?


I am trying to parse expressions of the form "1.0 + 2.0 + 3.0 + ..." into an AST. I have the following AST node for binary operations (complete, minimal code example is at the end):

struct binop_t
{
    expression_t lhs, rhs;
};

I want to use the "BOOST_FUSION_ADAPT_STRUCT" macro to allow this struct to be synthesized by a boost:spirit::qi::rule:

BOOST_FUSION_ADAPT_STRUCT
(
    client::ast::binop_t,
    (client::ast::expression_t, lhs)
    (client::ast::expression_t, rhs)
)

In other words, the binary operation AST node (binop_t) requires two operands - the left-hand side (lhs) and right-hand side (rhs) expressions that should be operated on. I am able to parse expressions of the form "1.0+(2.0+(3.0+4.0))" into this AST node by using the following qi::grammar:

qi::rule<Iterator, ast::literal_t(), ascii::space_type> literal;
qi::rule<Iterator, ast::binop_t(), ascii::space_type> binop;
qi::rule<Iterator, ast::expression_t(), ascii::space_type> primary_expr;
qi::rule<Iterator, ast::expression_t(), ascii::space_type> expr;

expr = binop.alias();

binop = primary_expr > qi::lit('+') > primary_expr;

primary_expr = (qi::lit('(') > expr > qi::lit(')')) 
             | literal
             ;

literal = qi::double_;

However, I am struggling to understand how to modify this grammar so that it can parse such expressions without the use of parenthesis (e.g. "1+2+3+4+...").

I have looked over the "calc4.cpp" Boost Spirit example, and noticed that it only uses the following AST node for Binary Operations (such as add):

struct operation
{
    optoken operator_;
    operand operand_;
};

The difference between this example and what I am trying to do is that the example defines the grammar for synthesizing binary operations node purely in terms of a list of unary operations. The list of unary operations is synthesized into an AST node called "program":

struct program
{
    operand first;
    std::list<operation> rest;
};

This whole thing in synthesized using the following grammar in the example:

    qi::rule<Iterator, ast::program(), ascii::space_type> expression;
    qi::rule<Iterator, ast::program(), ascii::space_type> term;
    qi::rule<Iterator, ast::operand(), ascii::space_type> factor;

        expression =
            term
            >> *(   (char_('+') >> term)
                |   (char_('-') >> term)
                )
            ;

        term =
            factor
            >> *(   (char_('*') >> factor)
                |   (char_('/') >> factor)
                )
            ;

        factor =
                uint_
            |   '(' >> expression >> ')'
            |   (char_('-') >> factor)
            |   (char_('+') >> factor)
            ;

In this grammar, the "expression" rule produces a "program" which is a list of operations". We can see from from the grammar rule for "expression" that it uses the Kleene star in the grammar:

*((char_('+') >> term)

This is how the grammar is able to parse chains of associative binary operations, such as "1+2+3+4+...". The attribute of this grammar is list, which matches the definition of the "program" AST node. The calculator "eval" function then simply iterates over the list of operations in "program," applying the operations to operands from left to right:

    int operator()(program const& x) const
    {
        int state = boost::apply_visitor(*this, x.first);
        BOOST_FOREACH(operation const& oper, x.rest)
        {
            state = (*this)(oper, state);
        }
        return state;
    }

I have also looked over the "mini-c" Boost Spirit example, and it has a very similar AST design, where there is no binary operator AST node (only a single "operator" node that accepts a single operand).

Following is the complete, minimal code example for the program that I've implemented by so far. To recap, my question is: how can I modify this program so that it is able to synthesize a tree of binop_t AST nodes from an expression like "1+2+3+4+..." without this use of parentheses in the input text:

#include <boost/variant.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
#include <exception>

using boost::variant;
using boost::recursive_wrapper;

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;

namespace client { namespace ast {

    struct literal_t;
    struct binop_t;

    typedef variant< recursive_wrapper<literal_t>,
                     recursive_wrapper<binop_t>
                   > expression_t;

    struct literal_t
    {
        double value;       
    };

    struct binop_t
    {
        expression_t lhs, rhs;
    };
}} // ns

BOOST_FUSION_ADAPT_STRUCT
(
    client::ast::literal_t,
    (double, value)
)

BOOST_FUSION_ADAPT_STRUCT
(
    client::ast::binop_t,
    (client::ast::expression_t, lhs)
    (client::ast::expression_t, rhs)
)

namespace client {
    template <typename Iterator>
    struct grammar_t : qi::grammar<Iterator, ast::expression_t(), ascii::space_type>
    {
        qi::rule<Iterator, ast::literal_t(), ascii::space_type> literal;
        qi::rule<Iterator, ast::binop_t(), ascii::space_type> binop;
        qi::rule<Iterator, ast::expression_t(), ascii::space_type> primary_expr;
        qi::rule<Iterator, ast::expression_t(), ascii::space_type> expr;

        grammar_t() : grammar_t::base_type(expr)
        {
            expr = binop.alias();

            binop = primary_expr > qi::lit('+') > primary_expr;

            primary_expr = (qi::lit('(') > expr > qi::lit(')')) 
                         | literal;

            literal = qi::double_;

            expr.name("expr");
            binop.name("binop");
            literal.name("literal");
            qi::debug(expr);
            qi::debug(binop);
            qi::debug(literal);
        }
    };
} // ns

int main()
{
    try
    {
        string input = "0.1 + 1.2 ";
        std::string::const_iterator begin = input.begin();
        std::string::const_iterator end = input.end();  

        typedef std::string::const_iterator iterator_type;
        client::grammar_t<iterator_type> g;

        client::ast::expression_t ast;

        bool status;    
        status = qi::phrase_parse(begin, end, g, ascii::space, ast);
        EXPECT_TRUE(status);
        EXPECT_TRUE(begin == end);
    } catch (std::exception& e)
    {
        cout << e.what() << endl;
    }
}

Solution

  • VeXocide on the ##spirit IRC channel on freenode solved the problem (http://codepad.org/wufmFufE). The answer is to modify the grammar as follows:

        expr = binop.alias();
    
        binop = primary_expr >> qi::lit('+') >> (binop | primary_expr);
    
        primary_expr = (qi::lit('(') >> expr >> qi::lit(')'))
                     | literal;            
    
        literal = qi::double_;
    

    This grammar creates a right recursion that's able to synthesize the parse tree that I'm looking for.

    Tip for anyone who encountered the same problem: Without the Spirit debug statements, Boost Spirit will cause Seg Fault due to a stack overflow if you provide it with a left recursive grammar. If you turn on the debug statements, it will print out an "infinte" amount of text which tells you something is going wrong in the parser.