Search code examples
javaregexunicodethai

Match a Thai Script character in Java


Over last two hours I have a lot of sexy time with Thai Script strings that slipped in my database. They collate mysteriously, mutate when output, do not have natural order and are a disaster.

I want to just ignore any strings with Thai Script characters, but I have no idea how:

Pattern.compile("\\p{Thai}") fails on init. "[ก-๛]" - would that ever work? What's the correct way?


Solution

  • Thai is a Unicode block, and Unicode blocks should be specified as \p{In...}:

    Pattern.compile("\\p{InThai}")