In my work I often have to compare strings and names to one another, and I recently came across the concept of Unicode equivalence which may help generalize many of the operations we do today manually, and maybe solve a lot of other edge cases.
My question is: how can I perform Unicode normalization or Unicode-equivalent comparison of strings according to the rules defined in the unicode equivalence specification in Java.
A brief search in stackoverflow/Google/Apache commons-text library didn't produce any results for tools that would allow me to do so.
Expanding on what's written in the comments, the built in Normalizer contains both an isNormalized
and normalize
methods. The normalize
method receives an argument of the Normalizer.Form
enum type allowing you to specify how you want to normalize. Example:
System.out.println(Normalizer.normalize("Some text", Normalizer.Form.NFKD)
Also, as noted here, each Java release supports a specific version of Unicode with its own standards regarding normalizations.