Looking at the Javadoc for java.util.regex.Pattern
\p{Alnum}
An alphanumeric character:[\p{IsAlphabetic}\p{IsDigit}]
it appears that every character that matches \p{IsAlphabetic}
should also match \p{Alnum}
However, it does not seem to be the case when the character has an accent. For example, the following assertion fails:
assertEquals("é".matches("\\p{IsAlphabetic}+"),"é".matches("\\p{Alnum}+"));
The same thing happens for other characters with accents such as ą
, ó
, ł
, ź
ż
. All match \p{IsAlphabetic}+
but not \p{Alnum}+
Am I mis-interpreting the Javadoc? Or is this a bug in the documentation or implementation?
By default \p{Alnum}
is treated as a POSIX character class which means it will only ever match ASCII characters. This means it will match a
and 1
but not ä
or ١
.
The passage you quote only applies when the UNICODE_CHARACTER_CLASS
flag is used.
Slightly oversimplified, this flag will turn the "old" POSIX style character classes into their equivalent Unicode character classes.