I'm hoping someone can shine a light through my ignorance of using the >
and >>
operators in spirit parsing.
I have a working grammar, where the top-level rule looks like
test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string >> conditionRule;
It relies on attributes to automatically allocated parsed values to a fusion-adapted struct (ie a boost tuple).
However, I know that once we match the operationRule, we must continue or fail (ie we don't want to allow backtracking to try other rules that begin with identifier
).
test = identifier >>
operationRule > repeat(1,3)[any_string] > arrow > any_string > conditionRule;
This causes a cryptic compiler error ('boost::Container' : use of class template requires template argument list
). Futz around a bit and the following compiles:
test = identifier >>
(operationRule > repeat(1,3)[any_string] > arrow > any_string > conditionRule);
but the attribute setting no longer works - my data structure contains garbage after parsing. This can be fixed by adding actions like [at_c<0>(_val) = _1]
, but that seems a little clunky - as well as making things slower according to the boost docs.
So, my questions are
()
operationRule
is matched (I suspect not, it seems that if the entire parser inside the (...)
fails backtracking will be allowed)?operation
is /not/ matched, but does not allow backtracking once operation /is/ matched?I realise this is quite a broad question - any hints that point in the right direction will be greatly appreciated!
Is it worth preventing back-tracking?
Absolutely. Preventing back tracking in general is a proven way to improve parser performance.
Using expectation points (>
) does not essentially reduce backtracking: it will just disallow it. This will enable targeted error messages, prevent useless 'parsing-into-the-unknown'.
Why do I need the grouping operator ()
I'm not sure. I had a check using my what_is_the_attr
helpers from here
ident >> op >> repeat(1,3)[any] >> "->" >> any
synthesizes attribute:
fusion::vector4<string, string, vector<string>, string>
ident >> op > repeat(1,3)[any] > "->" > any
synthesizes attribute:
fusion::vector3<fusion::vector2<string, string>, vector<string>, string>
I haven't found the need to group subexpressions using parentheses (things compile), but obviously DataT
needs to be modified to match the changed layout.
typedef boost::tuple<
boost::tuple<std::string, std::string>,
std::vector<std::string>,
std::string
> DataT;
The full code below shows how I'd prefer to do that, using adapted structs.
Does my above example really stop back-tracking after operationRule is matched (I suspect not, it seems that if the entire parser inside the (...) fails backtracking will be allowed)?
Absolutely. If the expectation(s) is not met, a qi::expectation_failure<>
exception is thrown. This by default aborts the parse. You could use qi::on_error to retry
, fail
, accept
or rethrow
. The MiniXML example has very good examples on using expectation points with qi::on_error
If the answer to the previous question is /no/, how do I construct a rule that allows backtracking if operation is /not/ matched, but does not allow backtracking once operation /is/ matched?
Why does the grouping operator destroy the attribute grammar - requiring actions?
It doesn't destroy the attribute grammar, it just changes the exposed type. So, if you bind an appropriate attribute reference to the rule/grammar, you won't need semantic actions. Now, I feel there should be ways to go without the grouping , so let me try it (preferrably on your short selfcontained sample). And indeed I have found no such need. I've added a full example to help you see what is happening in my testing, and not using semantic actions.
The full code show 5 scenarios:
OPTION 1: Original without expectations
(no relevant changes)
OPTION 2: with expectations
Using the modified typedef for DataT (as shown above)
OPTION 3: adapted struct, without expectations
Using a userdefined struct with BOOST_FUSION_ADAPT_STRUCT
OPTION 4: adapted struct, with expectations
Modifying the adapted struct from OPTION 3
OPTION 5: lookahead hack
This one leverages a 'clever' (?) hack, by making all >>
into expectations, and detecting the presence of a operationRule
-match beforehand. This is of course suboptimal, but allows you to keep DataT
unmodified, and without using semantic actions.
Obviously, define OPTION
to the desired value before compiling.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
#ifndef OPTION
#define OPTION 5
#endif
#if OPTION == 1 || OPTION == 5 // original without expectations (OR lookahead hack)
typedef boost::tuple<std::string, std::string, std::vector<std::string>, std::string> DataT;
#elif OPTION == 2 // with expectations
typedef boost::tuple<boost::tuple<std::string, std::string>, std::vector<std::string>, std::string> DataT;
#elif OPTION == 3 // adapted struct, without expectations
struct DataT
{
std::string identifier, operation;
std::vector<std::string> values;
std::string destination;
};
BOOST_FUSION_ADAPT_STRUCT(DataT, (std::string, identifier)(std::string, operation)(std::vector<std::string>, values)(std::string, destination));
#elif OPTION == 4 // adapted struct, with expectations
struct IdOpT
{
std::string identifier, operation;
};
struct DataT
{
IdOpT idop;
std::vector<std::string> values;
std::string destination;
};
BOOST_FUSION_ADAPT_STRUCT(IdOpT, (std::string, identifier)(std::string, operation));
BOOST_FUSION_ADAPT_STRUCT(DataT, (IdOpT, idop)(std::vector<std::string>, values)(std::string, destination));
#endif
template <typename Iterator>
struct test_parser : qi::grammar<Iterator, DataT(), qi::space_type, qi::locals<char> >
{
test_parser() : test_parser::base_type(test, "test")
{
using namespace qi;
quoted_string =
omit [ char_("'\"") [_a =_1] ]
>> no_skip [ *(char_ - char_(_a)) ]
> lit(_a);
any_string = quoted_string | +qi::alnum;
identifier = lexeme [ alnum >> *graph ];
operationRule = string("add") | "sub";
arrow = "->";
#if OPTION == 1 || OPTION == 3 // without expectations
test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string;
#elif OPTION == 2 || OPTION == 4 // with expectations
test = identifier >> operationRule > repeat(1,3)[any_string] > arrow > any_string;
#elif OPTION == 5 // lookahead hack
test = &(identifier >> operationRule) > identifier > operationRule > repeat(1,3)[any_string] > arrow > any_string;
#endif
}
qi::rule<Iterator, qi::space_type/*, qi::locals<char> */> arrow;
qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> operationRule;
qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> identifier;
qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
qi::rule<Iterator, DataT(), qi::space_type, qi::locals<char> > test;
};
int main()
{
std::string str("addx001 add 'str1' \"str2\" -> \"str3\"");
test_parser<std::string::const_iterator> grammar;
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
DataT data;
bool r = phrase_parse(iter, end, grammar, qi::space, data);
if (r)
{
using namespace karma;
std::cout << "OPTION " << OPTION << ": " << str << " --> ";
#if OPTION == 1 || OPTION == 3 || OPTION == 5 // without expectations (OR lookahead hack)
std::cout << format(delimit[auto_ << auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
#elif OPTION == 2 || OPTION == 4 // with expectations
std::cout << format(delimit[auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
#endif
}
if (iter!=end)
std::cout << "Remaining: " << std::string(iter,end) << "\n";
}
Output for all OPTIONS:
for a in 1 2 3 4 5; do g++ -DOPTION=$a -I ~/custom/boost/ test.cpp -o test$a && ./test$a; done
OPTION 1: addx001 add 'str1' "str2" -> "str3" --> addx001 add [ str1 str2 ] --> str3
OPTION 2: addx001 add 'str1' "str2" -> "str3" --> addx001 add [ str1 str2 ] --> str3
OPTION 3: addx001 add 'str1' "str2" -> "str3" --> addx001 add [ str1 str2 ] --> str3
OPTION 4: addx001 add 'str1' "str2" -> "str3" --> addx001 add [ str1 str2 ] --> str3
OPTION 5: addx001 add 'str1' "str2" -> "str3" --> addx001 add [ str1 str2 ] --> str3