Search code examples
swiftstringunicodeencodingswift-string

What is String.Encoding.unicode?


Swift offers a series of encodings for strings. As of the time I'm writing this, none of them are documented, which makes this absurdly more confusing than it should be...

I can understand that .ascii means it's ASCII encoded, .utf8 means the string is UTF-8 encoded, and .utf16BigEndian means the string is UTF-16 but big-endian. These obviously map to real text encodings.

Then there's .unicode. There is no "Unicode" encoding. The Unicode standard defines UTF-8, UTF-16, and UTF-32, which, as I said above, are already defined in Swift.

Is it a fancy one which figures out the best one for the system? Is it an alias for .utf8? Is it some weird Apple Unicode encoding?


Solution

  • It would appear to be an alias for .utf16. From CFString.h:

    #define kCFStringEncodingInvalidId (0xffffffffU)
    typedef CF_ENUM(CFStringEncoding, CFStringBuiltInEncodings) {
        kCFStringEncodingMacRoman = 0,
        kCFStringEncodingWindowsLatin1 = 0x0500, /* ANSI codepage 1252 */
        kCFStringEncodingISOLatin1 = 0x0201, /* ISO 8859-1 */
        kCFStringEncodingNextStepLatin = 0x0B01, /* NextStep encoding*/
        kCFStringEncodingASCII = 0x0600, /* 0..127 (in creating CFString, values greater than 0x7F are treated as corresponding Unicode value) */
        kCFStringEncodingUnicode = 0x0100, /* kTextEncodingUnicodeDefault  + kTextEncodingDefaultFormat (aka kUnicode16BitFormat) */
        kCFStringEncodingUTF8 = 0x08000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF8Format */
        kCFStringEncodingNonLossyASCII = 0x0BFF, /* 7bit Unicode variants used by Cocoa & Java */
    
        kCFStringEncodingUTF16 = 0x0100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16Format (alias of kCFStringEncodingUnicode) */
        kCFStringEncodingUTF16BE = 0x10000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16BEFormat */
        kCFStringEncodingUTF16LE = 0x14000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16LEFormat */
    
        kCFStringEncodingUTF32 = 0x0c000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32Format */
        kCFStringEncodingUTF32BE = 0x18000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32BEFormat */
        kCFStringEncodingUTF32LE = 0x1c000100 /* kTextEncodingUnicodeDefault + kUnicodeUTF32LEFormat */
    };
    

    You can confirm this with:

    print(String.Encoding.unicode.rawValue, String.Encoding.utf16.rawValue)