Search code examples
javaregexquantifiers

"Empty" interval quantifiers in Java Pattern class


In Java Pattern class, the expressions with quantifier alone, like + or additional quantifiers, like a++++ are not allowed, and both cases will throw a exception (because of dangling meta character). However Pattern class allows to use a interval quantifier alone ({1},{2,6}) which will match empty spaces between characters. What's more number in interval doesn't matter, so the {1} and {99999} will match alike, which maybe is reasonable (as infinite "nothing" could fit in same space), but could be misleading. In effect regular expressions like:

a{2}
a{2}{34}{9999,99999}
a{2}{45}
a{2}+{234}+{9,999999999}?

are valid patterns, and will match exactly the same. So in effect it is an useless feature.

In most of regex flavours, with exception of Ruby as far as I know, such usage of interval is not allowed. The quantifiers of both types are treated equally, and must be preceded by quantifiable element to create valid pattern.


What reason is behind such solution in Pattern class? Is it just a bug? engine difference? Why treat an interval differently than other quantifiers?


Solution

  • The reason is that the Java devs got it wrong; there's no excuse for not treating intervals like other quantifiers. I checked, and it's been like this since jdk1.4, when regexes were first added. So it's not a regression, but I would call it a bug. What's the point of quantifying nothing?

    I'm not a fan of what Ruby does, treating a{2}{3} as two 'a's, three times (same as (?:a{2}){3}), but at least that makes sense, and it has precedent: GNU ERE (egrep, awk, emacs) works the same way.

    By the way, a{2}+ is valid--silly, but valid. Every flavor I know of that supports possessive quantifiers allows it, pointless though it is. Same as with the reluctant modifier (a{2}?), disallowing it for non-variable quantifiers would have been just as confusing as allowing it, so they went with the option that was easier to support.

    But {234}+{9,999999999}? is just whack. It blows my mind that that actually compiles in Java.