Search code examples
macoscocoacore-text

what's the native utf encoding for CFString in OSX?


This should be easy to find out but I can't seem to find it anywhere - so please excuse if it's a no brainer. What's the native UTF storage used in a CFString, UTF-16, UTF-8 etc.

Why I'm asking is I'm interfacing to some lua code and lua can handle UTF-8 strings but if I convert them to CFSTring, then there'll be a performance penalty if it uses UTF-16 internally?

I had a look at the CFStringGetSystemEncoding and it returns mac roman - which doesn't seem to be correct.

locale returns

LANG="en_AU.UTF-8"...

which indicates it's utf-8 but then the docs seem to indicate it's 16 bit?

tia


Solution

  • There is short chapter “String storage” in the docs saying that it can have various encodings internally:

    Although conceptually CFString objects store strings as arrays of Unicode characters, in practice they often store them more efficiently. The memory a CFString object requires to represent a string could often be less than that required by a simple UniChar array.

    The last paragraph suggests a solution for those who are concerned about extra conversions:

    You can get further control over the backing store of a string with the CFStringCreateMutableWithExternalCharactersNoCopy function. This function creates a reference to a mutable CFString object but allows you to retain full ownership of the Unicode buffer holding the object’s characters; the object itself points to the buffer as its backing store.