Misunderstanding repeat directive - it should fail, but doesn't

I would like to write a grammar (highly simplified) with:

grr := integer [ . integer ]

with

integer ::= digit { [ underline ] digit }

Since the parsed literals are needed again later (the real grammar is more complex, not everything can be converted to a number immediately) the literal must be stored completely as string (more precisely as iterator_range) in the AST for later use (with underline).

The problem now is that the literal expressions can be longer than they should be (regarding the implementation/computation etc. later). The obvious solution is the repeat directive (here detailed for Qi repeat or very short for X3).

This is where my problems start (coliru):

    for(std::string_view const s : {
        // ok
        "0", "10", "1_0", "012345", 
        // too long
        "0123456",
        "1_2_3_4_5_6_7_8_9_0", 
        // absolutely invalid
        "1_2_3_4_5_6_", "_0123_456", ""
    }) {
        auto const cs = x3::char_("0-9");
        std::string attr;
        bool const ok = x3::parse(std::begin(s), std::end(s), 
            x3::raw[ cs >> x3::repeat(0, 5)[ ('_' >> cs) | cs] ],
            attr);
        cout << s << " -> " << attr 
             << " (" << std::boolalpha << ok << ")"
             << "\n";   
    }

gives

0 -> 0 (true)
10 -> 10 (true)
1_0 -> 1_0 (true)
012345 -> 012345 (true)
0123456 -> 012345 (true)
1_2_3_4_5_6_7_8_9_0 -> 1_2_3_4_5_6 (true)
1_2_3_4_5_6_ -> 1_2_3_4_5_6 (true)
_0123_456 ->  (false)
 ->  (false)

If the literal is too long, the parser should fail, which it does not. If it ends with an underline, it should do that too - but it doesn't. Underline at the beginning and empty literals are correctly recognized/parsed as false.

Meanwhile, I try to write the more complex parsers into a separate parser classes, but here I am e.g. missing the rule to recognize the literal ending with an underline....

Furthermore, BOOST_SPIRIT_X3_DEBUG seems to be broken all of a sudden - there is no output.

What is the solution to my problem? I'm out of ideas except absolutely low-level and complicated via iterator, counter, etc.

This problem also affects other rules to be implemented.

Solution

If the literal is too long, the parser should fail

Where does it say that? It looks like the code does exactly what you ask: it parses at most 6 digits with the requisite underscores. The output even confirms that it does exactly that.

You can of course make it much more apparent by showing what was not parsed:

Live On Coliru

auto f = begin(s), l = end(s);
bool const ok = x3::parse(
    f, l, x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]], attr);

fmt::print(
    "{:21} -> {:5} {:13} remaining '{}'\n",
    fmt::format("'{}'", s),
    ok,
    fmt::format("'{}'", attr),
    std::string(f, l));

Prints

'0'                   -> true  '0'           remaining ''
'10'                  -> true  '10'          remaining ''
'1_0'                 -> true  '1_0'         remaining ''
'012345'              -> true  '012345'      remaining ''
'0123456'             -> true  '012345'      remaining '6'
'1_2_3_4_5_6_7_8_9_0' -> true  '1_2_3_4_5_6' remaining '_7_8_9_0'
'1_2_3_4_5_6_'        -> true  '1_2_3_4_5_6' remaining '_'
'_0123_456'           -> false ''            remaining '_0123_456'
''                    -> false ''            remaining ''

Fixes

To assert that a complete input be parsed, use either x3::eoi or check the iterators:

Live On Coliru

bool const ok = x3::parse(
    f,
    l,
    x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]] >> x3::eoi,
    attr);

Prints

'0'                   -> true  '0'           remaining ''
'10'                  -> true  '10'          remaining ''
'1_0'                 -> true  '1_0'         remaining ''
'012345'              -> true  '012345'      remaining ''
'0123456'             -> false '012345'      remaining '0123456'
'1_2_3_4_5_6_7_8_9_0' -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_7_8_9_0'
'1_2_3_4_5_6_'        -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_'
'_0123_456'           -> false ''            remaining '_0123_456'
''                    -> false ''            remaining ''

Distinct Lexemes

If instead you want to allow the input to continue, just not with certain characters, e.g. parsing many such "numbers":

auto const number = x3::lexeme[ //
    x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
    // within the lexeme, assert that no digit or _ follows
    >> ! (cs | '_') //
];

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
    namespace x3 = boost::spirit::x3;
    auto const cs = x3::digit;
    auto const number = x3::lexeme[ //
        x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
        // within the lexeme, assert that no digit or _ follows
        >> ! (cs | '_') //
    ];
    auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
    auto const numbers = x3::skip(ws_or_comment)[number % ','];
} // namespace Parser

int main()
{
    std::vector<std::string> attr;
    std::string_view const s =
        R"(0,
           10,
           1_0,
           012345,
           // too long
           0123456,
           1_2_3_4_5_6_7_8_9_0,
           // absolutely invalid
           1_2_3_4_5_6_,
           _0123_456)"sv;

    auto f = begin(s), l = end(s);
    bool const ok = parse(f, l, Parser::numbers, attr);

    fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints

true: ["0", "10", "1_0", "012345"]
remaining ',
           // too long
           0123456,
           1_2_3_4_5_6_7_8_9_0,
           // absolutely invalid
           1_2_3_4_5_6_,
           _0123_456'

Proving It

To drive home the point of checking inside the lexeme in the presence of otherwise insignificant whitespace:

auto const numbers = x3::skip(ws_or_comment)[*number];

With a slightly adjusted test input (removing the commas):

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
    namespace x3 = boost::spirit::x3;
    auto const cs = x3::digit;
    auto const number = x3::lexeme[ //
        x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
        // within the lexeme, assert that no digit or _ follows
        >> ! (cs | '_') //
    ];
    auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
    auto const numbers = x3::skip(ws_or_comment)[*number];
} // namespace Parser

int main()
{
    std::vector<std::string> attr;
    std::string_view const s =
        R"(0
           10
           1_0
           012345
           // too long
           0123456
           1_2_3_4_5_6_7_8_9_0
           // absolutely invalid
           1_2_3_4_5_6_
           _0123_456)"sv;

    auto f = begin(s), l = end(s);
    bool const ok = parse(f, l, Parser::numbers, attr);

    fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints

true: ["0", "10", "1_0", "012345"]
remaining '0123456
           1_2_3_4_5_6_7_8_9_0
           // absolutely invalid
           1_2_3_4_5_6_
           _0123_456'