Search code examples
c++boostboost-spirit

Boost spirit lexeme and its attributes


I'm using a parser which skips white space. At one point, I don't want to skip, so I want to use qi::lexeme. However, this either does not compile or messes up my results. I especially can't grasp the last point. How are the attributes of a lexeme handled?

Here is an example of what I'm trying to do:

#include <iostream>
#include <iomanip>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/vector.hpp>

namespace qi = boost::spirit::qi;
namespace fu = boost::fusion;

struct printer_type
{
    void operator() (int i) const
    {
        std::cout << i << ' ';
    }

    void operator() (std::string s) const
    {
        std::cout << '"' << s << '"' << ' ';
    }

} printer;

int main() {
    for (std::string str : { "1foo 13", "42 bar 13", "13cheese 8", "101pencil13" }) {
        auto iter = str.begin(), end = str.end();

        qi::rule<std::string::iterator, qi::blank_type, fu::vector<int, std::string, int>()> parser = qi::int_ >> +qi::alpha >> qi::int_;

        fu::vector<int, std::string, int> result;
        bool r = qi::phrase_parse(iter, end, parser, qi::blank, result);

        std::cout << " --- " << std::quoted(str) << " --- ";
        if (r) {
            std::cout << "parse succeeded: ";
            fu::for_each(result, printer);
            std::cout << '\n';
        } else {
            std::cout << "parse failed.\n";
        }

        if (iter != end) {
            std::cout << " Remaining unparsed: " << std::string(iter, str.end()) << '\n';
        }
    }
}

Notice this line:

qi::rule<std::string::iterator, qi::blank_type, fu::vector<int, std::string, int>()> parser = 
                      qi::int_ >> +qi::alpha >> qi::int_;

Okay, so we want an int, then a string and then again an int. However, I don't want to skip white space between the first int and the string, here there must be no space. If I use lexeme, the synthesized attributes get messed up.

A run without lexeme gives the following results:

--- "1foo 13" --- parse succeeded: 1 "foo" 13 
 --- "42 bar 13" --- parse succeeded: 42 "bar" 13 
 --- "13cheese 8" --- parse succeeded: 13 "cheese" 8 
 --- "101pencil13" --- parse succeeded: 101 "pencil" 13 

So everything parses fine, which is good. However, the second example (42 bar 13) should not parse successfully, so here is the result with lexeme around the first int and the string (qi::lexeme[qi::int_ >> +qi::alpha] >> qi::int_;):

" 0  "1foo 13" --- parse succeeded: 1 "
 --- "42 bar 13" --- parse failed.
 Remaining unparsed: 42 bar 13
 --- "13cheese 8" --- parse succeeded: 13 " 0 
" 0  "101pencil13" --- parse succeeded: 101 "

What!? I have not the slightest clue what is going on, I'm happy for any enlightment :)

Side question: I would like to leave out lexeme entirely and define a subrule which does not skip. How can i specify the attributes in this case?

The subrule has then the attribute fusion::vector<int, std::string>(), but I still want the main rule to have fusion::vector<int, std::string, int>() as attribute, not fusion::vector<fusion::vector<int, std::string>, int>() (which does not compile anyway).


Solution

  • Use no_skip directive: qi::int_ >> qi::no_skip[+qi::alpha] >> qi::int_

     --- "1foo 13" --- parse succeeded: 1 "foo" 13 
     --- "42 bar 13" --- parse failed.
     Remaining unparsed: 42 bar 13
     --- "13cheese 8" --- parse succeeded: 13 "cheese" 8 
     --- "101pencil13" --- parse succeeded: 101 "pencil" 13 
    

    https://wandbox.org/permlink/PdS14l0b3qjJwz5S


    Sooo.... what!? I have not the slightest clue what is going on, i'm happy for any enlightment :)

    As @llonesmiz mentioned the qi::lexeme[qi::int_ >> +qi::alpha] >> qi::int_ parser binds to tuple<tuple<int,std::string>,int> and you have triggered trac 8013 bug/misfeature twice here (the first time for the whole sequence parser, and the second time for the sequence inside lexeme)`.