Search code examples
c++c++11char16-t

convert from char to char16_t


My config:

  • Compiler: gnu gcc 4.8.2
  • I compile with C++11
  • platform/OS: Linux 64bit Ubuntu 14.04.1 LTS

I have this method:

static inline std::u16string StringtoU16(const std::string &str) {
    const size_t si = strlen(str.c_str());
    char16_t cstr[si+1];
    memset(cstr, 0, (si+1)*sizeof(char16_t));
    const char* constSTR = str.c_str();
    mbstate_t mbs;
    memset (&mbs, 0, sizeof (mbs));//set shift state to the initial state
    size_t ret = mbrtoc16 (cstr, constSTR, si, &mbs);
    std::u16string wstr(cstr);
    return wstr;
}

I want a conversion between char to char16_T pretty much (via std::string and std::u16string to facilitate memory management) but regardless of the size of the input variable str, it will return the first character only. If str= "Hello" it will return "H". I am not sure what is wrong my my method. Value of ret is 1.


Solution

  • I didn't know mbrtoc16() can only handle one character at a time.. what a turtle. Here is then the code I generate, and works like a charm:

    static inline std::u16string StringtoU16(const std::string &str) {
        std::u16string wstr = u"";
        char16_t c16str[3] = u"\0";
        mbstate_t mbs;
        for (const auto& it: str){
            memset (&mbs, 0, sizeof (mbs));//set shift state to the initial state
            memmove(c16str, u"\0\0\0", 3);
            mbrtoc16 (c16str, &it, 3, &mbs);
            wstr.append(std::u16string(c16str));
        }//for
        return wstr;
    }
    

    for its counterpart (when one way is needed, sooner or later the other way will be needed):

    static inline std::string U16toString(const std::u16string &wstr) {
        std::string str = "";
        char cstr[3] = "\0";
        mbstate_t mbs;
        for (const auto& it: wstr){
            memset (&mbs, 0, sizeof (mbs));//set shift state to the initial state
            memmove(cstr, "\0\0\0", 3);
            c16rtomb (cstr, it, &mbs);
            str.append(std::string(cstr));
        }//for
        return str;
    }
    

    Be aware that c16rtomb will be lossy if a character cannot be converted from char16_t to char (might endup printing a bunch of '?' depending on your system) but it will work without complains.