Search code examples
javaphpregexreluctant-quantifiersnon-greedy

Java Regexp: UNGREEDY flag


I'd like to port a generic text processing tool, Texy!, from PHP to Java.

This tool does ungreedy matching everywhere, using preg_match_all("/.../U"). So I am looking for a library, which has some UNGREEDY flag.

I know I could use the .*? syntax, but there are really many regular expressions I would have to overwrite, and check them with every updated version.

I've checked

  • ORO - seems to be abandoned
  • Jakarta Regexp - no support
  • java.util.regex - no support

Is there any such library?

Thanks, Ondra


Solution

  • I suggest you create your own modified Java library. Simply copy the java.util.regex source into your own package.

    The Sun JDK 1.6 Pattern.java class offers these default flags:

    static final int GREEDY     = 0;
    
    static final int LAZY       = 1;
    
    static final int POSSESSIVE = 2;
    

    You'll notice that these flags are only used a couple of times, and it would be trivial to modify. Take the following example:

        case '*':
            ch = next();
            if (ch == '?') {
                next();
                return new Curly(prev, 0, MAX_REPS, LAZY);
            } else if (ch == '+') {
                next();
                return new Curly(prev, 0, MAX_REPS, POSSESSIVE);
            }
            return new Curly(prev, 0, MAX_REPS, GREEDY);
    

    Simply change the last line to use the 'LAZY' flag instead of the GREEDY flag. Since your wanting a regex library to behave like the PHP one, this might be the best way to go.