Search code examples
iosobjective-cunicodeutf-8ucs2

Can anyone tell me how to convert UTF-8 value to UCS-2 value in Objective-c?


I am trying to convert UTF-8 string into UCS-2 string. I need to get string like "\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". I have googled for about a month by now, but still there is no reference about converting UTF-8 to UCS-2. Please someone help me. Thx in advance.

EDIT: okay, maybe my explanation was not good enough. Here is what I am trying to do. I live in Korea, and I am trying to send a sms message using CTMessageCenter. I tried to send chinese simplified character through my app. And I get ???? Instead of proper characters. So I tried UTF-8, UTF-16, BE and LE as well. But they all return ??. Finally I found out that SMS uses UCS-2 and EUC-KR encoding in Korea. Weird, isn't it? Anyway I tried to send string like \u4E3B\u9875 and it worked. So I need to convert string into UCS-2 encoding first and get the string literal from those strings.


Solution

  • Wikipedia:

    The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.2 It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 96.9% of all the code points in the range 0-0xFFFF, including all characters that had been assigned a value at that time.

    IBM:

    Since the UCS-2 standard is limited to 65,535 characters, and the data processing industry needs over 94,000 characters, the UCS-2 standard is in the process of being superseded by the Unicode UTF-16 standard.

    However, because UTF-16 is a superset of the existing UCS-2 standard, you can develop your applications using the systems existing UCS-2 support as long as your applications treat the UCS-2 as if it were UTF-16.

    uincode.org:

    UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

    UCS-2 does not define a distinct data format, because UTF-16 and UCS-2 are identical for purposes of data exchange. Both are 16-bit, and have exactly the same code unit representation.

    So, using the "UTF8toUnicode" transformation in most language libraries will produce UTF-16, which is essentially UCS-2. And simply extracting the 16-bit characters from an Objective-C string will accomplish the same thing.

    In other words, the solution has been staring you in the face all along.