Search code examples
boost-spirit-qi

boost spirit repetition parser behaves unexpectedly


using boost 1.57 spirit::qi under windows 7
I'm working on an ipv6 parser and must be misunderstanding how repetition parser directives work. Given the following (simplified)

ipv6part = repeat(1, 4)[xdigit];

ipv6address =
-(repeat(1,4)[ipv6part >> lit(':')] >> ipv6part) >> 
    lit("::") >> ipv6part >> lit(':') >> ipv6part
| ...

I would expect to match the following addresses:

1111:5555::ffff:eeee
1111:2222:5555::ffff:eeee
1111:2222:3333:5555::ffff:eeee
1111:2222:3333:4444:5555::ffff:eeee

however, when i test the only match is the maximum from the repeat clause:

1111:2222:3333:4444:5555::ffff:eeee

Now, explicitly specifying each combination matches all cases:

ipv6address =
-(repeat(1)[ipv6part >> lit(':')] >> ipv6part) >> 
    lit("::") >> ipv6part >> lit(':') >> ipv6part
| -(repeat(2)[ipv6part >> lit(':')] >> ipv6part) >> 
    lit("::") >> ipv6part >> lit(':') >> ipv6part
| -(repeat(3)[ipv6part >> lit(':')] >> ipv6part) >> 
    lit("::") >> ipv6part >> lit(':') >> ipv6part
| -(repeat(4)[ipv6part >> lit(':')] >> ipv6part) >> 
    lit("::") >> ipv6part >> lit(':') >> ipv6part
| ...

but this seems silly; it can't be right.


Solution

  • it appears the repeat(N,M) doesn't backtrack when the remainder of the parse fails. so, if the input is 11:22:55::ff:ee, the repeat part takes 11:22:55: leaving :ff:ee which fails.

    I'm not quite sure from the docs if this is intended behavior, but a workaround is to not grab a colon as the final character in the repeat, which avoids the problem of splitting the '::', like this

    ipv6address =
    -(ipv6part >> repeat(1,4)[lit(':') >> ipv6part]) >> lit("::") >> ipv6part >> lit(':') >> ipv6part