Search code examples
javaregexunicode

\p{InLatin1Supplement} is an unknown unicode-character-property-name in Regex


The docs for java.util.regex.Pattern specify:

Blocks are specified with the prefix In, as in InMongolian, or by using the keyword block (or its short form blk) as in block=Mongolian or blk=Mongolian.

The block names supported by Pattern are the valid block names accepted and defined by UnicodeBlock.forName.

and there exists a constant Character.UnicodeBlock.LATIN_1_SUPPLEMENT which does get found by UnicodeBlock.forName.

Either way, I'm getting a

java.util.regex.PatternSyntaxException: Unknown character property name {InLatin1Supplement} near index 21
\p{InLatin1Supplement}
                     ^

Whats up with that?


Solution

  • The Unicode property class in Java looks like

    \p{InLatin_1_Supplement}
    

    See the Java demo online:

    String s = "ëè";
    System.out.println(s.matches("\\p{InLatin_1_Supplement}+")); // -> true