Search code examples
c++boost-spirit

Boost spirit is too greedy


I'm in between a deep admiration about boost::spirit and eternal frustration not to understand it ;)

I have problems with strings that are too greedy and therefore it doesn't match. Below a minimal example that doesn't parse as the txt rule eats up end.

More information about what i'd like to do : the goal is to parse some pseudo-SQL and I skip whitespaces. In a statement like

select foo.id, bar.id from foo, baz 

I need to treat from as a special keyword. The rule is something like

"select" >> txt % ',' >> "from" >> txt % ',' 

but it obviously doesn't work at it sees bar.id from foo as one item.

#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main(int, char**) {
    auto txt = +(qi::char_("a-zA-Z_"));
    auto rule = qi::lit("Hello") >> txt % ',' >> "end";
    std::string str = "HelloFoo,Moo,Bazend";
    std::string::iterator begin = str.begin();
    if (qi::parse(begin, str.end(), rule))
        std::cout << "Match !" << std::endl;
    else
        std::cout << "No match :'(" << std::endl;
}

Solution

  • Here's my version, with changes marked:

    #include <boost/spirit/include/qi.hpp>
    #include <iostream>
    namespace qi = boost::spirit::qi;
    int main(int, char**) {
      auto txt = qi::lexeme[+(qi::char_("a-zA-Z_"))];     // CHANGE: avoid eating spaces
      auto rule = qi::lit("Hello") >> txt % ',' >> "end";
      std::string str = "Hello Foo, Moo, Baz end";        // CHANGE: re-introduce spaces
      std::string::iterator begin = str.begin();
      if (qi::phrase_parse(begin, str.end(), rule, qi::ascii::space)) {          // CHANGE: used phrase_parser with a skipper
        std::cout << "Match !" << std::endl << "Remainder (should be empty): '"; // CHANGE: show if we parsed the whole string and not just a prefix
        std::copy(begin, str.end(), std::ostream_iterator<char>(std::cout));
        std::cout << "'" << std::endl;
      }
      else {
        std::cout << "No match :'(" << std::endl;
      }
    }
    

    This compiles and runs with GCC 4.4.3 and Boost 1.4something; output:

    Match !
    Remainder (should be empty): ''
    

    By using lexeme, you can avoid eating spaces conditionally, so that txt matches up to a word boundary only. This yields the desired result: because "Baz" is not followed by a comma, and txt doesn't eat spaces, we never accidentally consume "end".

    Anyway, I'm not 100% sure this is what you're looking for -- in particular, is str missing spaces as an illustrative example, or are you somehow forced to use this (spaceless) format?

    Side note: if you want to make sure that you've parsed the entire string, add a check to see if begin == str.end(). As stated, your code will report a match even if only a non-empty prefix of str was parsed.

    Update: Added suffix printing.