I'm working through the bonus levels on regex golf, and I'm currently on the modulus problem. The regex engine used is "theoretically ECMAScript, but browser implementations vary, often by version." I don't know off the top of my head which version my browser (Firefox 34.0) runs.
Basically, the idea is to match an expression of the form
x* % x+ = x*
where the quantity of repeated x
's represent numbers. The catch is to only match valid modulo operations.
My best solution so far is the following:
^(?=x+ % (x+) )\1*(x*) % x+ = \2$
That is, I use a lookahead to get the number of x
's in the second group, match that pattern as many times as I can, then get a backreference to the remainder which must be the pattern on the right hand side.
Now as far as this goes it appears to work, but it's failing (matching erroneously) in two specific cases:
xxxxx % xxxxx = xxxxx
xxxxxxxxxxxxxx % xxx = xxxxx
One of the cool features of that particular regex golf implementation is that it shows you the part of the string that is being matched. What's really interesting is that if I take off the end-of-line binding (the $
), the matched region for both goes from the beginning of the line to the ^
below:
xxxxx % xxxxx = xxxxx
^
xxxxxxxxxxxxxx % xxx = xxxxx
^
This is exactly as I would suspect -- the first one gobbles up the entire second x
group, so \2
ends up being empty. In the second, the real result is 2, so \2
is xx
, and that's all that gets matched. But when I add the anchor, the match jumps to the end of the line.
My expression works for these:
xxxxxxxxxxxx % xx = x
xxxxx % xxx = xxxx
where the results are also 0 and 2 respectively.
So what's up? Am I missing a fundamental problem in the logic of my expression?
I figured out what's happening. The regex engine can choose to match the \1
pattern FEWER times than it might, and in both cases this allows it to extend the \2
match so it equals the rhs. Now to figure out how to force it to be REALLY greedy... maybe another lookahead? Advice is welcome.
The problem is that $2 might match more than $1 x
s.
You can solve it with a simple negative lookahead: (https://regex101.com/r/oY1mV7/1)
^(?=x+ % (x+) )\1*(?!\1)(x*) % x+ = \2$
OR (https://regex101.com/r/oY1mV7/2)
\b(x*)\1*(?!\1)(x+) % \1\b = \2\b
Another option without lookaround is to use a possessive quantifier, but that is not supported at all in JavaScript: (https://regex101.com/r/oY1mV7/3)
\b(x*)\1*+(x+) % \1\b = \2\b