Search code examples

Boost spirit parse string starts with a prefix

I'm currently trying to parse string, starting with some prefix + digit. Like abc_.+ \d+. But have some big problems with it. Here is a test code

#include <iostream>
#include <vector>
#include <string>
#include <iterator>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/phoenix/phoenix.hpp>

namespace qi = boost::spirit::qi;

struct S {
        std::string s;
        int n = 0;

struct parser : qi::grammar<std::string::const_iterator, S(), qi::ascii::space_type> {
    typedef std::string::const_iterator Iterator;

    qi::rule<Iterator, S(), qi::ascii::space_type> start;
    qi::rule<Iterator, std::string(), qi::ascii::space_type> abc;

        parser() : parser::base_type(start) {
        abc = qi::raw[ "abc_" >> +(qi::alnum)];
        //abc = qi::raw[ "abc_" >> +(qi::alpha)];
        start %=  abc >> qi::int_;
        BOOST_SPIRIT_DEBUG_NODES( (start)(abc))

int main() {
    using boost::spirit::ascii::space;
    parser g;

    for(std::string str : {"abc 1", "abc_ 1", "abc_aaa 1", "abc_555 1", "cba_aaa 1"}) {
        std::cout << str << " - ";
        std::string::const_iterator iter = str.begin();
        std::string::const_iterator end = str.end();
        S s;
        bool r = phrase_parse(iter, end, g, space, s);
            std::cout << "Ok";
            std::cout << "fail";
        std::cout << std::endl;

For some reason, qi::alnum also consumes space:

abc_aaa 1 - <start>
  <try>abc_aaa 1</try>
    <try>abc_aaa 1</try>
    <attributes>[[a, b, c, _, a, a, a,  , 1]]</attributes>

If i change it to qi::alpha

abc_aaa 1 - <start>
  <try>abc_aaa 1</try>
    <try>abc_aaa 1</try>
    <attributes>[[a, b, c, _, a, a, a,  ]]</attributes>
  <attributes>[[[a, b, c, _, a, a, a,  ], 1]]</attributes>

works fine, but it is impossible to parse tokens like abc_123.

Any advice?


Try it on Coliru


  • Since you have provided a skipper, sequence parser as well as plus parser (and some other) it is used between primitive parsers matches, and because of that abc parser matches abc_(\s*[0-9a-zA-Z])+.

    Exactly for your case there is a lexeme directive, which provides a mechanism to disable skipping where it is not needed. Using it like abc = qi::raw[qi::lexeme["abc_" >> +qi::alnum]] will match abc_[0-9a-zA-Z]+, and the whole grammar will match abc_[0-9a-zA-Z]+\s*\d+.