Search code examples
objective-cunicodeutf-8normalizationunicode-normalization

Obj-C: Issue with Unicode character composition involving UTF-8 literals


I'm facing a problem with composing Unicode characters in Obj-C, described by the next example code, that tries to combine 'e' with acute accent:

NSLog(@"Composing with Unicode literal: '%@'\nComposing with UTF-8 literal: '%@'",
      [[NSString stringWithUTF8String:"e\u0301"]
       precomposedStringWithCanonicalMapping],
      [[NSString stringWithUTF8String:"e\xc2\xb4"] // "\xc\xb4" is UTF-8 rep of "\u0301"
       precomposedStringWithCanonicalMapping]);

The output is:

Composing with Unicode literal: 'é'
Composing with UTF-8 literal: 'e´'

So the code yields the correct result only when the acute is specified as \u literal, while using UTF-8 representation appears to produce wrong result. My question: Is there a way to use UTF-8 nevertheless?


Solution

  • You have the wrong UTF-8 encoding for the combining accent.

    Change \xc2\xb4 to \xcc\x81. This change will give you the expected result.

    The accent you were using in the non-combining accent.