Search code examples
icutransliteration

Why is there NFC in NFD; [:Nonspacing Mark:] Remove; NFC?


On http://userguide.icu-project.org/transforms/general one can read

to remove accents from characters, use the following transform:

NFD; [:Nonspacing Mark:] Remove; NFC.

This transform separates accents from their base characters, removes the accents, and then puts the remaining text into an unaccented form.

NFD performs a canonical decomposition, so why is there a need to recompose once non spacing marks have been removed?


Solution

  • Okay so canonical decomposition isn't limited to diacritics; I was given the example of Hangul syllables which can be split into many jamos. It can then makes sense to recompose such characters.