Search code examples
javaunicodenormalizationmatcherunicode-normalization

How to normalize all special characters but umlauts?


I'd like to normalize any extended ascii characters, but exclude umlauts.

If I'd like to include umlauts, I would go for:

Normalizer.normalize(value, Normalizer.Form.NFKD)
    .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

But how can I exclude german umlauts?

As a result I would like to get:

source: üöäâÇæôøñÁ

desired result: üöäaCaeoonA or similar


Solution

  • From here I see 2 solutions, the first one is quite dirty the second is quite boring to implement I guess.