Search code examples
c++character-encodingcharutf-16

Create UTF-16 string from char*


So I have standard C string:

char* name = "Jakub";

And I want to convert it to UTF-16. I figured out, that UTF-16 will be twice as long - one character takes two chars.
So I create another string:

char name_utf_16[10];  //"Jakub" is 5 characters

Now, I believe with ASCII characters I will only use lower bytes, so for all of them it will be like 74 00 for J and so on. With that belief, I can make such code:

void charToUtf16(char* input, char* output, int length) {
    /*Todo: how to check if output is long enough?*/
    for(int i=0; i<length; i+=2)  //Step over 2 bytes
    {
        //Lets use little-endian - smallest bytes first
        output[i] = input[i];
        output[i+1] = 0;  //We will never have any data for this field
    }
}

But, with this process, I ended with "Jkb". I know no way to test this properly - I've just sent the string to Minecraft Bukkit Server. And this is what it said upon disconnecting:

13:34:19 [INFO] Disconnecting jkb?? [/127.0.0.1:53215]: Outdated server!

Note: I'm aware that Minecraft uses big-endian. Code above is just an example, in fact, I have my conversion implemented in class.


Solution

  • Why do you want to make your own Unicode conversion functionality when theres's existing C/C++ functions for this, like mbstowcs() which is included in <cstdlib>.

    If you still want to make your own stuff, then have a look at Unicode Consortium's open source code which can be found here:

    Convert UTF-16 to UTF-8 under Windows and Linux, in C