Search code examples
c++boost-spirit-qi

Segmentation fault with recursive Spirit.Qi grammar


I'm trying to create a very simple parser for a very simplistic language that only contains numbers and mathematical expressions. Ultimately I plan to expand this but not until I can get these basic versions working.

I've successfully parsed:

1
425
1 + 1
1 - 1
1 * 1
1 / 1

No problem. But I wanted to make it recursive, let's say, to parse input like:

1 + 2 - 3

I began to get segmentation faults. I've done some googling around for recursive grammars and segmentation faults and I can't seem to apply anything I've found to this grammar to make it work. This is either due to them not fitting my situation or my failure to correctly understand what is happening with my qi grammar.

My grammar consists of the following structs (including fusion adaptations):

namespace fun_lang {
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;
    namespace phoenix = boost::phoenix;
    namespace fusion = boost::fusion;

    struct number_node {
        long value;
    };

    struct operation_node;

    typedef boost::variant<
        boost::recursive_wrapper<operation_node>,
        number_node
    > node;

    struct operation_node {
        node left, right;
        char op;
    };

    struct program {
        std::vector<node> nodes;
    };
}

BOOST_FUSION_ADAPT_STRUCT(fun_lang::program, (std::vector<fun_lang::node>, nodes));
BOOST_FUSION_ADAPT_STRUCT(fun_lang::number_node, (long, value));
BOOST_FUSION_ADAPT_STRUCT(fun_lang::operation_node, (fun_lang::node, left) (char, op) (fun_lang::node, right));

namespace fun_lang {
    template <typename Iterator, typename Skipper>
    struct fun_grammar : qi::grammar<Iterator, program(), Skipper> {
        fun_grammar() : fun_grammar::base_type(start) {
            using ascii::char_;
            using qi::ulong_;
            using qi::_val;
            using qi::_1;

            using phoenix::push_back;
            using phoenix::at_c;

            expression = (integer | operation)[_val = _1];

            oper = (char_('+') | char_('-') | char_('*') | char_('/'))[_val = _1];
            integer = ulong_[at_c<0>(_val) = _1];

            operation = expression[at_c<0>(_val) = _1] >> oper[at_c<1>(_val) = _1] >> expression[at_c<2>(_val) = _1];

            start = *expression[push_back(at_c<0>(_val), _1)];
        }

        qi::rule<Iterator, program(), Skipper> start;
        qi::rule<Iterator, number_node(), Skipper> integer;
        qi::rule<Iterator, char(), Skipper> oper;
        qi::rule<Iterator, node(), Skipper> expression;
        qi::rule<Iterator, operation_node(), Skipper> operation;
    };
}

Some of the rule structures are based off a yacc grammar I wrote for another language which I was using as a reference for a way to structure these rules. I'm not sure what is causing the segmentation fault but I know when running this that is what I receive. I've tried simplifying rules, removing some intermediate rules, and testing non-recursive methods. Anything that is not recursive seems to work but I've seen many examples of Spirit with recursive rules that were successful so I feel like I'm just not quite understanding how to express those.

EDIT

For aid in solving the problem you can find a mostly exact copy on ideone. The only difference between the ideone version and what I have locally is instead of reading a file it pulls directly from standard input.


Solution

  • There are two sources of stack overflows (which end in segmentation faults). One is the constructor of operation_node and node. boost::variant, when default-constructed, is initialized with a default-constructed object of its first template argument. This is boost::recursive_wrapper<operation_node>, which constructs an operation_node, which constructs two nodes, which construct a boost::recursive_wrapper<operation_node>, and this goes on until the stack is exhausted.

    It is common to give variants in spirit grammars a nil type like struct nil { }; as first argument to prevent this and have a way to identify uninitialised variants, so

    struct nil { };
    
    typedef boost::variant<
        nil,
        boost::recursive_wrapper<operation_node>,
        number_node
    > node;
    

    will fix this. If you don't want to use a nil type,

    typedef boost::variant<
        number_node,
        boost::recursive_wrapper<operation_node>
    > node;
    

    will also work in your case because number_node can be constructed without issue.

    The other stack overflow is because Boost.Spirit generates LL(inf) parsers (as opposed to yacc, which generates LALR(1) parsers), which means that what you get is a recursive descent parser. The rules

    expression = (integer | operation)[_val = _1];
    operation = expression[at_c<0>(_val) = _1] >> oper[at_c<1>(_val) = _1] >> expression[at_c<2>(_val) = _1];
    

    generate a parser that descends from operation into expression and back into operation without consuming any input. This recurses until the stack overflows, and that is where you get your other segfault.

    If you reformulate the rule operation as

    operation = integer[at_c<0>(_val) = _1] >> oper[at_c<1>(_val) = _1] >> expression[at_c<2>(_val) = _1];
    

    this problem goes away. Furthermore, you'll have to rewrite the expression rule as

    expression = (operation | integer)[_val = _1];
    

    for the match to work as I think you expect, otherwise the integer part will successfully match before operation has a chance to be found, and the parser will not backtrack because it has a successful partial match.

    Also note that Spirit parsers are attributed; the parser actions you use are largely unnecessary. It is possible to rewrite the bulk of your grammar like this:

    expression = operation | integer;
    
    oper = char_("-+*/");
    integer = ulong_;
    
    operation = integer >> oper >> expression;