Search code examples
c++unicodeutf-8normalizationunicode-normalization

OSX and C++ unicode conversion from NFD to NFC


I have a problem with NFD Unicode strings I get from the OSX Filesystem.

This is what I get for the "Ä"-Umlaut on OSX "A\xcc\x88" and this is what I expect "\xc3\x84". The same function does it right under windows (simple boost filesystem operation, listing an directory).

After searching a while, I found out that Apple the NFD coding for UTF-8 and the rest of the world NFC. I tried a bit with converting through NSStrings or with boost::locale::normalize, but without success.

Does anybody know a way to do this in C++ (I can use Cocoa through obj-c if necessary)?

I would like the raw unicode string as std::string (with unicode coding) after that.


Solution

  • This is the solution to get the precomposed form.

    std::string precomposeFilename(const std::string& name)
    {
       CFStringRef cfStringRef = CFStringCreateWithCString(kCFAllocatorDefault, name.c_str(), kCFStringEncodingUTF8);
       CFMutableStringRef cfMutable = CFStringCreateMutableCopy(NULL, 0, cfStringRef);
    
       CFStringNormalize(cfMutable,kCFStringNormalizationFormC);
    
       char c_str[255 + 1];
       CFStringGetCString(cfMutable, c_str, sizeof(c_str)-1, kCFStringEncodingUTF8);
    
       CFRelease(cfStringRef);
       CFRelease(cfMutable);
    
       return std::string(c_str);
    }