using boost 1.57 spirit::qi under windows 7
I'm working on an ipv6 parser and must be misunderstanding how repetition parser directives work.
Given the following (simplified)
ipv6part = repeat(1, 4)[xdigit];
ipv6address =
-(repeat(1,4)[ipv6part >> lit(':')] >> ipv6part) >>
lit("::") >> ipv6part >> lit(':') >> ipv6part
| ...
I would expect to match the following addresses:
1111:5555::ffff:eeee
1111:2222:5555::ffff:eeee
1111:2222:3333:5555::ffff:eeee
1111:2222:3333:4444:5555::ffff:eeee
however, when i test the only match is the maximum from the repeat clause:
1111:2222:3333:4444:5555::ffff:eeee
Now, explicitly specifying each combination matches all cases:
ipv6address =
-(repeat(1)[ipv6part >> lit(':')] >> ipv6part) >>
lit("::") >> ipv6part >> lit(':') >> ipv6part
| -(repeat(2)[ipv6part >> lit(':')] >> ipv6part) >>
lit("::") >> ipv6part >> lit(':') >> ipv6part
| -(repeat(3)[ipv6part >> lit(':')] >> ipv6part) >>
lit("::") >> ipv6part >> lit(':') >> ipv6part
| -(repeat(4)[ipv6part >> lit(':')] >> ipv6part) >>
lit("::") >> ipv6part >> lit(':') >> ipv6part
| ...
but this seems silly; it can't be right.
it appears the repeat(N,M) doesn't backtrack when the remainder of the parse fails. so, if the input is 11:22:55::ff:ee, the repeat part takes 11:22:55: leaving :ff:ee which fails.
I'm not quite sure from the docs if this is intended behavior, but a workaround is to not grab a colon as the final character in the repeat, which avoids the problem of splitting the '::', like this
ipv6address =
-(ipv6part >> repeat(1,4)[lit(':') >> ipv6part]) >> lit("::") >> ipv6part >> lit(':') >> ipv6part