Search code examples
iosobjective-cutf-8utf-16

Convert UTF-16 numbers to UTF-8?


I have a iPad app where the user enters a phone number in a text field. Sometimes the phone number is entered in UTF-16 (Japanese sometimes enter their phone numbers this way), but most of the time in UTF-8.

My question is three-part:

  • is there a way I can tell if the number is UTF-8 or UTF-16?
  • how do I convert from UTF-16 to UTF-8, given the number is numeric?
  • having looked and found nothing, anyone know of a good treatise on this subject? (converting back and forth in iOS).

Solution

  • All Unicode encodings can be converted between without problem. UTF-8 is just another encoding for the same thing as UTF-16. The main reason that East Asian users use UTF-16 more often than UTF-8 is, that it's more space efficient to encode codepoints of the East Asian Unicode planes in UTF-16.

    Coversion between Unicode encodings is more or less straightforward: Unicode assigns each character a codepoint. Codepoints are encoded into bytestreams in a encoding specific way. So what you must do is decode the UTF-16 bytestream into single Unicode codepoints and then backconvert them into a UTF-8 encoded bytestream.

    is there a way I can tell if the number is UTF-8 or UTF-16?

    That's not what you're looking for. You want to know the encoding of the character string.

    how do I convert from UTF-16 to UTF-8

    Preferrably using a tested Unicode library like ICU. Also libiconv may be usefull to you, but mind the license.