Search code examples
c++boostboost-spiritboost-spirit-qi

Boost Spirit optional parser and backtracking


Why this parser leave 'b' in attributes, even if option wasn't matched?

using namespace boost::spirit::qi;

std::string str = "abc";

auto a = char_("a");
auto b = char_("b");
qi::rule<std::string::iterator, std::string()> expr;
expr = +a >> -(b >> +a);

std::string res;

bool r = qi::parse(
        str.begin(),
        str.end(),
        expr >> lit("bc"),
        res
);

It parses successfully, but res is "ab".

If parse "abac" with expr alone, option is matched and attribute is "aba".

Same with "aac", option doesn't start to match and attribute is "aa".

But with "ab", attribute is "ab", even though b gets backtracked, and, as in example, matched with next parser.

UPD

With expr.name("expr"); and debug(expr); I got

<expr>
  <try>abc</try>
  <success>bc</success>
  <attributes>[[a, b]]</attributes>
</expr>

Solution

  • Firstly, it's UB to use the auto variables to keep the expression templates, because they hold references to the temporaries "a" and "b" [1].

    Instead write

    expr = +qi::char_("a") >> -(qi::char_("b") >> +qi::char_("a"));
    

    or, if you insist:

    auto a = boost::proto::deep_copy(qi::char_("a"));
    auto b = boost::proto::deep_copy(qi::char_("b"));
    expr = +a >> -(b >> +a);
    

    Now noticing the >> lit("bc") part hiding in the parse call, suggests you may expect backtracking to on succesfully matched tokens when a parse failure happens down the road.

    That doesn't happen: Spirit generates PEG grammars, and always greedily matches from left to right.


    On to the sample given, ab results, even though backtracking does occur, the effects on the attribute are not rolled back without qi::hold: Live On Coliru

    Container attributes are passed along by ref and the effects of previous (successful) expressions is not rolled back, unless you tell Spirit too. This way, you can "pay for what you use" (as copying temporaries all the time would be costly).

    See e.g.

    <a>
      <try>abc</try>
      <success>bc</success>
      <attributes>[a]</attributes>
    </a>
    <a>
      <try>bc</try>
      <fail/>
    </a>
    <b>
      <try>bc</try>
      <success>c</success>
      <attributes>[b]</attributes>
    </b>
    <a>
      <try>c</try>
      <fail/>
    </a>
    <bc>
      <try>bc</try>
      <success></success>
      <attributes>[]</attributes>
    </bc>
    Success: 'ab'
    

    [1] see here: