Search code examples
javaregexstringpattern-matchingposix

Case-insensitive POSIX regex is not case-insensitive in Java Pattern & Matcher


I am not an expert in Regex, might be an obvious reason, but I cannot find an answer to this.

I use a POSIX notation to match a String (n) using Regex in Java in a case-insensitive way. Given:

Pattern pattern = Pattern.compile("\\p{Upper}", Pattern.CASE_INSENSITIVE); 
Matcher matcher = pattern.matcher("n");

Why the following code results in false?

boolean find = matcher.find();

In the Pattern documentation, I found the following (emphesizes mine):

\p{Upper} An upper-case alphabetic character: [A-Z]

Tested against the Regex [A-Z], the following results in true:

Pattern pattern = Pattern.compile("[A-Z]", Pattern.CASE_INSENSITIVE); 
Matcher matcher = pattern.matcher("n");
boolean find = matcher.find();

What is the difference?


Solution

  • Whether for right or for wrong - the Posix character classes ignore the CASE_INSENSITIVE flag. Although \p{Upper} works like [A-Z], it's not exactly the same - and it doesn't look at the case insensitive flag.

    The code in the Pattern class that checks posic character classes doesn't refer to the CASE_INSENSITIVE flag:

    /**
     * Node class that matches a POSIX type.
     */
    static final class Ctype extends BmpCharProperty {
        final int ctype;
        Ctype(int ctype) { this.ctype = ctype; }
        boolean isSatisfiedBy(int ch) {
            return ch < 128 && ASCII.isType(ch, ctype);
        }
    }