Let's combine a regular i
with a combining acute accent, and normalize the result (using Python's unicodedata.normalize
):
from unicodedata import normalize
normalize("NFC", "i\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace")
b'\\N{LATIN SMALL LETTER I WITH ACUTE}'
As expected: a small i
with the dot swapped out for an acute accent, í
.
Let's do the same with a dotless i:
from unicodedata import normalize
normalize("NFC", "\N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace")
b'\\N{LATIN SMALL LETTER DOTLESS I}\\N{COMBINING ACUTE ACCENT}'
As you can see, it does not combine. Other implementations, e.g., this one, do the same.
Why not? Is this consistent with the Unicode standard?
From The Unicode Standard, Version 14.0, Diacritics on i and j (highlighting by myself):
A dotted (normal) i or j followed by some common nonspacing marks above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases of accented dotted-i equivalent to accented dotless-i (for example, i + ¨ ≠ ı + ¨).