I am not an expert in Regex, might be an obvious reason, but I cannot find an answer to this.
I use a POSIX notation to match a String (n
) using Regex in Java in a case-insensitive way. Given:
Pattern pattern = Pattern.compile("\\p{Upper}", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("n");
Why the following code results in false
?
boolean find = matcher.find();
In the Pattern
documentation, I found the following (emphesizes mine):
\p{Upper} An upper-case alphabetic character: [A-Z]
Tested against the Regex [A-Z]
, the following results in true
:
Pattern pattern = Pattern.compile("[A-Z]", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("n");
boolean find = matcher.find();
What is the difference?
Whether for right or for wrong - the Posix character classes ignore the CASE_INSENSITIVE
flag. Although \p{Upper}
works like [A-Z]
, it's not exactly the same - and it doesn't look at the case insensitive flag.
The code in the Pattern
class that checks posic character classes doesn't refer to the CASE_INSENSITIVE
flag:
/**
* Node class that matches a POSIX type.
*/
static final class Ctype extends BmpCharProperty {
final int ctype;
Ctype(int ctype) { this.ctype = ctype; }
boolean isSatisfiedBy(int ch) {
return ch < 128 && ASCII.isType(ch, ctype);
}
}