I'm currently trying to parse string, starting with some prefix + digit.
Like abc_.+ \d+
. But have some big problems with it.
Here is a test code
#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <vector>
#include <string>
#include <iterator>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/phoenix/phoenix.hpp>
namespace qi = boost::spirit::qi;
struct S {
std::string s;
int n = 0;
};
BOOST_FUSION_ADAPT_STRUCT(S, s, n)
struct parser : qi::grammar<std::string::const_iterator, S(), qi::ascii::space_type> {
typedef std::string::const_iterator Iterator;
qi::rule<Iterator, S(), qi::ascii::space_type> start;
qi::rule<Iterator, std::string(), qi::ascii::space_type> abc;
parser() : parser::base_type(start) {
abc = qi::raw[ "abc_" >> +(qi::alnum)];
//abc = qi::raw[ "abc_" >> +(qi::alpha)];
start %= abc >> qi::int_;
BOOST_SPIRIT_DEBUG_NODES( (start)(abc))
}
};
int main() {
using boost::spirit::ascii::space;
parser g;
for(std::string str : {"abc 1", "abc_ 1", "abc_aaa 1", "abc_555 1", "cba_aaa 1"}) {
std::cout << str << " - ";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
S s;
bool r = phrase_parse(iter, end, g, space, s);
if(r)
std::cout << "Ok";
else
std::cout << "fail";
std::cout << std::endl;
}
}
For some reason, qi::alnum also consumes space:
abc_aaa 1 - <start>
<try>abc_aaa 1</try>
<abc>
<try>abc_aaa 1</try>
<success></success>
<attributes>[[a, b, c, _, a, a, a, , 1]]</attributes>
</abc>
<fail/>
</start>
fail
If i change it to qi::alpha
abc_aaa 1 - <start>
<try>abc_aaa 1</try>
<abc>
<try>abc_aaa 1</try>
<success>1</success>
<attributes>[[a, b, c, _, a, a, a, ]]</attributes>
</abc>
<success></success>
<attributes>[[[a, b, c, _, a, a, a, ], 1]]</attributes>
</start>
Ok
works fine, but it is impossible to parse tokens like abc_123
.
Any advice?
Thanks!
Since you have provided a skipper, sequence parser as well as plus parser (and some other) it is used between primitive parsers matches, and because of that abc
parser matches abc_(\s*[0-9a-zA-Z])+
.
Exactly for your case there is a lexeme
directive, which provides a mechanism to disable skipping where it is not needed. Using it like abc = qi::raw[qi::lexeme["abc_" >> +qi::alnum]]
will match abc_[0-9a-zA-Z]+
, and the whole grammar will match abc_[0-9a-zA-Z]+\s*\d+
.