Search code examples
clinuxiconv

convert UTF-16LE to UTF-8 with iconv()


I`m trying to convert UTF-16LE to UTF-8 with iconv() in Linux.

and i think it done..

But i got some trouble.. with my code..

and i think two codes are same, but first one not working. only second one working.

#include "stdio.h"
#include "string.h"
#include "iconv.h"
#include "errno.h"
#if 1
int fn2Utf8(char inBuf[], char outBuf[]) {
    size_t readBytes = sizeof(inBuf);
    size_t writeBytes = sizeof(outBuf);
    char* in = inBuf;
    char* out = outBuf;

    iconv_t convert = iconv_open("UTF-8","UTF-16LE");
    if (iconv(convert, &in, &readBytes, &out, &writeBytes) < 0) {
        return (-1);
    }
    iconv_close(convert);
    printf("[%s] [%s]\n", inBuf, outBuf);
    return (out - outBuf);
}
int main() {
    char inBuf[128] ="\x5c\x00\xbd\xac\x01\xc6\x00\xd3\x5c\x00\x00\xb3\x78\xc6\x44\xbe\x5c\x00\x2a\x00\x00\x00";
    char outBuf[128];
    fn2Utf8(inBuf, outBuf);
    return 0;
}
#else
int main() {
    char inBuf[128] = "\x5c\x00\xbd\xac\x01\xc6\x00\xd3\x5c\x00\x00\xb3\x78\xc6\x44\xbe\x5c\x00\x2a\x00\x00\x00";
    char outBuf[128];
    size_t readBytes = sizeof(inBuf);
    size_t writeBytes = sizeof(outBuf);
    char* in = inBuf;
    char* out = outBuf;

    iconv_t convert = iconv_open("UTF-8","UTF-16LE");
    if (iconv(convert, &in, &readBytes, &out, &writeBytes) < 0) {
    return (-1);
    }
    iconv_close(convert);
    printf("[%s] [%s]\n", inBuf, outBuf);
    return 0;
}
#endif

You can complie two type of code with if 0 -> if 1

and i need if 1 method.


Solution

  • Here's the problem:

    size_t readBytes = sizeof(inBuf);
    size_t writeBytes = sizeof(outBuf);
    

    When you pass arrays to a function, they decay to pointers to their first element. Your call

    fn2Utf8(inBuf, outBuf);
    

    is equal to

    fn2Utf8(&inBuf[0], &outBuf[0]);
    

    That means that in the function the arguments are not arrays, but pointers. And when you do sizeof on a pointer you get the size of the pointer and not what it's pointing to.

    There are two solutions: The first is to pass in the length of the arrays as arguments to the function, and use that. The second, at least for the inBuf argument, is to rely on the fact that it's a null-terminated string and use strlen instead.

    The second way, with strlen, works only on inBuf as I already said, but doesn't work on outBuf where you have to use the first way and pass in the size as an argument.


    If works in the program without the function because then you are doing sizeof on the array, and not a pointer. When you have an array and not a pointer, sizeof will give you the size in bytes of the array.