Search code examples
c++boostboost-spirit

Why does'n boost::spirit match foo123 with (+alpha | +alnum) grammar?


I have a more complex boost::spirit grammar that doesn't match like I expected. I was able to break it down to this minimal example: http://ideone.com/oPu2e7 (doesn't compile there, but compiles with VS2010)

Basically this is my grammar:

my_grammar() : my_grammar::base_type(start)
{
    start %=
        (+alpha | +alnum)
    ;
}
qi::rule<Iterator, std::string(), ascii::space_type> start;

It matches foobar, 123foo but doesn't match foo123. Why? I would expect it to match all three.


Solution

  • PEG parsers match greedy, left-to-right. That should be enough to explain.

    But lets look at foo123: it matches "1 or more +alpha, so the first branch is taken. The second branch is not taken, so the numerics 123 remain unparsed.

    There's no "inherent" backtracking on the kleen operators. You /can/ employ backtracking if you know e.g. that you need to parse the full input:

     (+alpha >> eoi | +alnum >> eoi)