Search code examples
c++parsingboostboost-spirit-qi

Boost Spirit - Extract list into single string


I'm having trouble in understanding exactly how and when Spirit decides to merge matches into single entities. What I am trying to do is to match a list of words inside double square brackets, and I would like to extract the full text inside the brackets. Example:

[[This is some single-spaced text]] -> "This is some single-spaced text"

My grammar is as follows:

qi::rule<Iterator, std::string()> word  = +(char_ - char_(" []"));
qi::rule<Iterator, std::string()> entry = lit("[[") >> word >> *(char_(' ') >> word) >> lit("]]") >> -qi::eol;

std::string text;
bool r = parse( first, last, entry, text );

However, this parses the example text as follows:

[[This is some single-spaced text]] -> "Thisissomesingle-spacedtext"

I don't understand why this is happening. I'm not using lit for the space, nor any rule or parser seems to ignore whitespace, if I understood Spirit correctly. I'm not sure how to verify that the results of my grammar are the ones I want (for example to avoid having the space in a tuple with each word, instead of being concatenated).

What should I do to obtain the result I want?


Solution

  • You're probably using a (string)stream. In that case, you will want to se std::noskipws on the stream:

    #include <boost/spirit/include/qi.hpp>
    #include <sstream>
    
    namespace qi = boost::spirit::qi;
    
    int main()
    {
        typedef boost::spirit::istream_iterator Iterator;
    
        std::istringstream iss("[[This is some single-spaced text]]");
        qi::rule<Iterator, std::string()> entry = "[[" >> qi::lexeme [ +(qi::char_ - "]]") ] >> "]]";
    
        // this is key:
        iss >> std::noskipws; // or:
        iss.unsetf(std::ios::skipws);
    
        Iterator f(iss), l;
        std::string parsed;
        if (qi::parse(f, l, entry >> -qi::eol, parsed))
        {
            std::cout << "Parsed: '" << parsed << "'\n";
        } else
            std::cout << "Failed";
    
        if (f!=l)
            std::cout << "Remaining: '" << std::string(f,l) << "'\n";
    }
    

    Prints

    Parsed: 'This is some single-spaced text'