I am trying to use boost spirit to create a parser for a simple language. The first statement I am trying to parse is a simple numeric addition: "3.14 + 1". It segfaults, and my research indicates it is because of a left-recursive implementation, but I can't wrap my head around the solution for this that isn't left-recursive. Here is my parser implementation:
template<typename Iterator>
struct simple_grammar : qi::grammar<Iterator, AstNodePtr(), ascii::space_type> {
simple_grammar() : simple_grammar::base_type(start) {
using qi::double_;
using qi::_1;
using qi::_2;
using qi::_3;
auto addhelper = '+' >> expression;
auto add = expression >> addhelper;
expression = add[qi::_val = make_shared_<AddNode>()(_1, _2)]
| double_[qi::_val = make_shared_<DoubleNode>()(_1)];
start = expression;
}
qi::rule<Iterator, AstNodePtr(), ascii::space_type> start;
qi::rule<Iterator, AstNodePtr(), ascii::space_type> expression;
};
AstNodePtr is a typedef for a std::shared_ptr to an AstNodeBase class, from which DoubleNode and AddNode inherit. I didn't include those definitions just to keep the post short, but can if requested. It is pretty self-explanatory I think, the idea is to compose an AST. The make_shared_ implementation comes from here.
Thanks in advance!
There are a number of things about your example.
First off, you use auto
with Spirit Qi expressions. That's not valid, and leads to UB:
Next off, you chose to use polymorphic Ast nodes. That's possible but likely not efficient:
Finally, there's the left recursion, since your expression starts with itself, leading to infinite recursion. The only way to solve it is to split your productions up into "levels" of expressions. This also aids in generating the desired operator precedence:
expression = term >> char_("-+") >> term;
term = factor >> char_("*/%") >> factor;
factor = simple >> char_("^") >> simple;
In your case, I'd suggest:
simple
= qi::double_
[_val = make_shared_<DoubleNode>()(_1)];
;
expression
= (simple >> '+' >> expression)
[_val = make_shared_<AddNode>()(_1, _2)]
| simple
;
Of course, you can be simpler and slightly less inefficient here:
expression
= simple [_val = _1]
>> *(('+' >> expression)
[_val = make_shared_<AddNode>()(_val, _0)])
;
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace { // https://stackoverflow.com/a/21565350/85371
template <typename T>
struct make_shared_f
{
template <typename... A> struct result
{ typedef std::shared_ptr<T> type; };
template <typename... A>
typename result<A...>::type operator()(A&&... a) const {
return std::make_shared<T>(std::forward<A>(a)...);
}
};
}
template <typename T>
using make_shared_ = boost::phoenix::function<make_shared_f<T> >;
struct AstNode {
virtual ~AstNode() = default;
};
using AstNodePtr = std::shared_ptr<AstNode>;
struct DoubleNode : AstNode {
DoubleNode(double) {}
};
struct AddNode : AstNode {
AddNode(AstNodePtr, AstNodePtr) {}
};
#include <iomanip>
namespace qi = boost::spirit::qi;
template<typename Iterator>
struct simple_grammar : qi::grammar<Iterator, AstNodePtr()> {
simple_grammar() : simple_grammar::base_type(start) {
using namespace qi::labels;
simple
= qi::double_
[_val = make_shared_<DoubleNode>()(_1)];
;
expression
= simple [_val = _1]
>> *(('+' >> expression)
[_val = make_shared_<AddNode>()(_val, _1)])
;
start = qi::skip(qi::space) [ expression ];
BOOST_SPIRIT_DEBUG_NODES((start)(expression)(simple))
}
private:
qi::rule<Iterator, AstNodePtr()> start;
qi::rule<Iterator, AstNodePtr(), qi::space_type> expression;
// implicit lexemes
qi::rule<Iterator, AstNodePtr()> simple;
};
int main() {
simple_grammar<std::string::const_iterator> g;
for (std::string const input : {
"1 + 2",
"3.14"
})
{
auto f = begin(input), l = end(input);
AstNodePtr ast;
if (qi::parse(f, l, g, ast)) {
std::cout << "Succeeded: " << boost::core::demangle(typeid(*ast).name()) << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed " << std::quoted(std::string(f,l)) << "\n";
}
}
Prints
Succeeded: AddNode
Succeeded: DoubleNode