Search code examples
javastringunicode

How to perform Unicode equivalence checks / normalization in Java


In my work I often have to compare strings and names to one another, and I recently came across the concept of Unicode equivalence which may help generalize many of the operations we do today manually, and maybe solve a lot of other edge cases.

My question is: how can I perform Unicode normalization or Unicode-equivalent comparison of strings according to the rules defined in the unicode equivalence specification in Java.

A brief search in stackoverflow/Google/Apache commons-text library didn't produce any results for tools that would allow me to do so.


Solution

  • Expanding on what's written in the comments, the built in Normalizer contains both an isNormalized and normalize methods. The normalize method receives an argument of the Normalizer.Form enum type allowing you to specify how you want to normalize. Example:

    System.out.println(Normalizer.normalize("Some text", Normalizer.Form.NFKD)
    

    Also, as noted here, each Java release supports a specific version of Unicode with its own standards regarding normalizations.