Search code examples
cunicodecharwchar-t

Problem converting char to wchar_t (length wrong)


I am trying to create a simple datastructure that will make it easy to convert back and forth between ASCII strings and Unicode strings. My issue is that the length returned by the function mbstowcs is correct but the length returned by the function wcslen, on the newly created wchar_t string, is not. Am I missing something here?

typedef struct{

    wchar_t *string;
    long length; // I have also tried int, and size_t
} String;

void setCString(String *obj, char *str){

    obj->length = strlen(str);

    free(obj->string); // Free original string
    obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to

    //memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference

    size_t length = 0;

    length = mbstowcs(obj->string, (const char *)str, obj->length);

    printf("Length = %d\n",(int)length); // Prints correct length
    printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...

    if(length != wcslen(obj->string))
            printf("Length failure!\n");

    if(length == -1)
    {
        //Conversion failed, set string to NULL terminated character
        free(obj->string);
        obj->string = (wchar_t *)malloc(sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
        //do stuff
    }
}

Solution

  • The length you need to pass to mbstowcs() includes the L'\0' terminator character, but your calculated length in obj->length() does not include it - you need to add 1 to the value passed to mbstowcs().

    In addition, instead of using strlen(str) to determine the length of the converted string, you should be using mbstowcs(0, src, 0) + 1. You should also change the type of str to const char *, and elide the cast. realloc() can be used in place of a free() / malloc() pair. Overall, it should look like:

    typedef struct {
        wchar_t *string;
        size_t length;
    } String;
    
    void setCString(String *obj, const char *str)
    {
        obj->length = mbstowcs(0, src, 0);
        obj->string = realloc(obj->string, (obj->length + 1) * sizeof(wchar_t)); 
    
        size_t length = mbstowcs(obj->string, str, obj->length + 1);
    
        printf("Length = %zu\n", length);
        printf("!C string %s converted to wchar string %ls\n", str, obj->string);
    
        if (length != wcslen(obj->string))
                printf("Length failure!\n");
    
        if (length == (size_t)-1)
        {
            //Conversion failed, set string to NULL terminated character
            obj->string = realloc(obj->string, sizeof(wchar_t));
            obj->string = L'\0';
        }
        else
        {
            //Conversion worked!
            //do stuff
        }
    }
    

    Mark Benningfield points out that mbstowcs(0, src, 0) is a POSIX / XSI extension to the C standard - to obtain the required length under only standard C, you must instead use:

        const char *src_copy = src;
        obj->length = mbstowcs(NULL, &src_copy, 0, NULL);