Spirit X3: Custom number parser yield unexpected leading zero in the result

I'm writing a long number parser, which identify a valid number (maybe not representable in builtin integer type) and store the string as-is. But the result included an unexpected leading '0'.

The parser simply identify numbers in the form like 0xHHHHHH, ObBBBBBBB, 0OOOOOOO or DDDDDDDDD

To preserve the number prefix in the result, I use x3::string rather than x3::lit, the former parser has an attribute of String while the latter is unused

Here is the link to the code https://wandbox.org/permlink/E8mOpCcH3Svqb3FJ

And the same code in case the link expired.

#include <boost/spirit/home/x3.hpp>
#include <iostream>

namespace x3 = boost::spirit::x3;
namespace fusion = boost::fusion;

using x3::_val;
using x3::_attr;
using x3::_where;
using fusion::at_c;

x3::rule<struct LongHexInt, std::string> const long_hex_int = "long_hex_int";

auto const long_hex_int_def = x3::lexeme[
    (x3::string("0") >> +x3::char_('0', '7'))
    | ((x3::digit - '0') >> *x3::digit >> 'u')
    | ((x3::string("0x") | x3::string("0X")) >> +x3::xdigit)
    | ((x3::string("0b") | x3::string("0B")) >> +x3::char_('0', '1'))
];

BOOST_SPIRIT_DEFINE(long_hex_int);

int main() {
    std::string input = R"__(0x12345678ABCDEF)__";
    std::string output;
    if (x3::parse(input.begin(), input.end(), long_hex_int, output)) {
        std::cout << output;
    }
}

As it's shown in the result, the parser output is 00x12345678ABCDEF not 0x12345678ABCDEF, I don't know where the additional '0' come from.

After removing the alternation in line 15 ((x3::string("0") >> +x3::char_('0', '7'))), the code produced expected output. But I don't know why, is it a bug or my fault?

Solution

I'd personally simplify. The common part of the number format could be written as:

auto const common 
    = x3::no_case["0x"] >> x3::hex
    | x3::no_case["0b"] >> x3::bin
    | &x3::lit('0') >> x3::oct
    | x3::uint_ >> 'u'
    ;

This uses the builtin unsigned parsers from https://www.boost.org/doc/libs/1_71_0/libs/spirit/doc/html/spirit/qi/reference/numeric/uint.html

Now you could parse that into the string representation:

auto const long_hex_int
    = x3::rule<struct long_hex_int_, std::string> {"long_hex_int"}
    = x3::lexeme [ x3::raw [ common ] ];

But you could just as easily parse directly into the integral type:

auto const unsigned_literal
    = x3::rule<struct unsigned_literal_, uint32_t> {"unsigned_literal"}
    = x3::lexeme [ common ];

In fact here is a live demo with test cases:

Live On Coliru

for (std::string const input : { 
    "0",
    "00",
    "010",
    "0x0", "0b0", "0x10", "0b10", "0x010", "0b010",
    "0X0", "0B0", "0X10", "0B10", "0X010", "0B010",
    // fails:
    "", "0x", "0b", "0x12345678ABCDEF" })
{
    std::string str;
    uint32_t num;
    if (x3::parse(input.begin(), input.end(), long_hex_int >> x3::eoi, str)) {
        std::cout << std::quoted(input) << " -> " << std::quoted(str) << "\n";
        if (x3::parse(input.begin(), input.end(), unsigned_literal, num)) {
            std::cout << " numerical: " << std::hex << "0x" << num << " (" << std::dec << num << ")\n";
        }
    } else {
        std::cout << std::quoted(input) << " -> FAILED\n";
    }
}

Printing:

"0" -> "0"
 numerical: 0x0 (0)
"00" -> "00"
 numerical: 0x0 (0)
"010" -> "010"
 numerical: 0x8 (8)
"0x0" -> "0x0"
 numerical: 0x0 (0)
"0b0" -> "0b0"
 numerical: 0x0 (0)
"0x10" -> "0x10"
 numerical: 0x10 (16)
"0b10" -> "0b10"
 numerical: 0x2 (2)
"0x010" -> "0x010"
 numerical: 0x10 (16)
"0b010" -> "0b010"
 numerical: 0x2 (2)
"0X0" -> "0X0"
 numerical: 0x0 (0)
"0B0" -> "0B0"
 numerical: 0x0 (0)
"0X10" -> "0X10"
 numerical: 0x10 (16)
"0B10" -> "0B10"
 numerical: 0x2 (2)
"0X010" -> "0X010"
 numerical: 0x10 (16)
"0B010" -> "0B010"
 numerical: 0x2 (2)
"" -> FAILED
"0x" -> FAILED
"0b" -> FAILED
"0x12345678ABCDEF" -> FAILED

Extending for 64 bits

Extending for more precision should make more succeed, right?

It gets only slightly more annoying to write:

template <typename T = uint64_t>
auto const common 
    = x3::no_case["0x"] >> x3::uint_parser<T, 16>{}
    | x3::no_case["0b"] >> x3::uint_parser<T, 2>{}
    | &x3::lit('0')     >> x3::uint_parser<T, 8>{}
    | x3::uint_parser<T, 10>{} >> 'u'
    ;

But all the rest is the same, and your 64 bit example passes:

Live On Coliru

"0x12345678ABCDEF" -> 0x12345678abcdef (5124095577148911)

But 131! fails to parse, for obvious reasons:

"847158069087882051098456875815279568163352087665474498775849754305766436915303927682164623187034167333264599970492141556534816949699515865660644961729169613882287309922474300878212776434073600000000000000000000000000000000" -> FAILED

Bonus: Arbitrary precision

131! requires around log₂(131!) ≅ 737 bits... But you don't need to fall back to lugging around strings. Just drop in uint1024_t (or checked_uint1024_t) from Boost Multiprecision and your'e done:

Live On Coliru

using Number = boost::multiprecision::/*checked_*/uint1024_t;

And then

Number num;
if (x3::parse(input.begin(), input.end(), unsigned_literal<Number> >> x3::eoi, num)) {
    std::cout << std::quoted(input) << " -> " << std::hex << "0x" << num << " (" << std::dec << num << ")\n";
} else {
    std::cout << std::quoted(input) << " -> FAILED\n";
}

Note how nothing changed except uint64_t -> Number. And the output:

"0" -> 0x0 (0)
"00" -> 0x0 (0)
"010" -> 0x8 (8)
"0x0" -> 0x0 (0)
"0b0" -> 0x0 (0)
"0x10" -> 0x10 (16)
"0b10" -> 0x2 (2)
"0x010" -> 0x10 (16)
"0b010" -> 0x2 (2)
"0X0" -> 0x0 (0)
"0B0" -> 0x0 (0)
"0X10" -> 0x10 (16)
"0B10" -> 0x2 (2)
"0X010" -> 0x10 (16)
"0B010" -> 0x2 (2)
"0x12345678ABCDEF" -> 0x12345678ABCDEF (5124095577148911)
"847158069087882051098456875815279568163352087665474498775849754305766436915303927682164623187034167333264599970492141556534816949699515865660644961729169613882287309922474300878212776434073600000000000000000000000000000000u" -> 0x257F7A37BE2FBDD9980A97214F27DDC1E2FFA53ABBA836FFBE8AD1B9792E5D47A3C573A1B9C81D264662E41005A5D7432ADDBE44E3DDF12142D2B845FC9B184288345AD466B86A6685FE87AE100000000000000000000000000000000 (847158069087882051098456875815279568163352087665474498775849754305766436915303927682164623187034167333264599970492141556534816949699515865660644961729169613882287309922474300878212776434073600000000000000000000000000000000)