Search code examples
c++boost-spirit-x3

Boost.Spirit X3 -- operator minus does not work as expected


Consider the following code:

TEST_CASE("Requirements Parser Description", "[test]")
{
    namespace x3 = ::boost::spirit::x3;

    std::string s = "### Description\n\nSome\nmultiline\ntext."
                    "\n\n### Attributes";

    std::string expectedValue = "Some\nmultiline\ntext.";

    auto rule = x3::lit("### ") >> x3::lit("Description")
        >> (x3::lexeme
                [+x3::char_
                 - (x3::lit("###") >> *x3::space >> x3::lit("Attributes"))]);

    std::string value;
    bool success = x3::phrase_parse(s.begin(), s.end(), rule, x3::space, value);
    REQUIRE(success);
    REQUIRE(value == expectedValue);
}

which yields the following output:

test_boost_spirit_x3_parser.cpp:183: FAILED:
  REQUIRE( value == expectedValue )
with expansion:
  "Some
  multiline
  text.
  
### Attributes"
  ==
  "Some
  multiline
  text."

Any explanation why the minus operator does not work as I expect? Any fixes at hand?


Solution

  • Probably operator precedence. The unary + operator takes precedence over the binary - operator. This leads to:

    From the boost manual: The - operator difference parser matches LHS but not RHS.

    LHS is +x3::char_

    RHS is (x3::lit("###") >> *x3::space >> x3::lit("Attributes"))

    Now LHS +x3::char_ matches as many characters as it gets (greedy match). So LHS evaluates to

    Some
      multiline
      text.
      
    ### Attributes
    

    after that there are no characters left, so RHS matches nothing. As a result, the - operator matches as well (LHS yes, RHS no, which is exactly what you are seeing).

    Or, to put it otherwise: Your +x3::char_ eats up all remaining characters, before the - operator gets a chance.

    To fix it I guess you need to write

    +(x3::char_ - (x3::lit...))

    Thats at least from what I gather from the example here: https://www.boost.org/doc/libs/1_78_0/libs/spirit/doc/html/spirit/qi/reference/operator/difference.html

    test_parser("/*A Comment*/", "/*" >> *(char_ - "*/") >> "*/");

    Note the brackets around (char_ - "*/")