Search code examples
ingres

NFC or NFD - what is the difference?


In Ingres, the DBA has two options when creating Unicode-aware Ingres databases. createdb has the -i flag for NFC (Normalization Form C) and -n for NFD (Normalization Form C). Documentation makes no distinction between them, the description is almost identical.

May we assume there are no differences, or there actually are some differences between them?


Solution

  • The difference is whether the characters are composed (C) or decomposed (D).

    Letters with "extra bits" like ä can be represented in different ways. There is a Unicode code point specially created for a with two dots. That is the composed form, NFC. On the other hand you could represent it as the usual "a" followed by a combining character that adds the two dots. That is the decomposed form, NFD.

    The decomposed form takes more space, but the composed form makes some operations harder, such as comparing strings while ignoring differences in accents.