Search code examples
c++ciconviso-8859-1

Using iconv to convert strings to ISO-8859-1 in C/C++


I want to convert strings from the GBK character set to ISO-8859-1.

I have tried to use the iconv library, but iconv() always returns -1, and errno decodes to "Invalid or incomplete multibyte or wide character".

How can I achieve this?


Solution

  • If you have opened the conversion descriptor without //TRANSLIT or //IGNORE, then iconv() will return an error when the input character cannot be represented in the target character set. Since ISO-8859-1 cannot represent most GBK characters, this is likely what is happening. The following example works for me:

    #include <stdio.h>
    #include <string.h>
    #include <iconv.h>
    
    int main()
    {
        char *gbk_str = "GBK \xB5\xE7\xCA\xD3\xBB\xFA";
        char dest_str[100];
        char *out = dest_str;
        size_t inbytes = strlen(gbk_str);
        size_t outbytes = sizeof dest_str;
        iconv_t conv = iconv_open("ISO-8859-1//TRANSLIT", "GBK");
    
        if (conv == (iconv_t)-1) {
            perror("iconv_open");
            return 1;
        }
    
        if (iconv(conv, &gbk_str, &inbytes, &out, &outbytes) == (size_t)-1) {
            perror("iconv");
            return 1;
        }
    
        dest_str[sizeof dest_str - outbytes] = 0;
        puts(dest_str);
    
        return 0;
    }
    

    (I hope that GBK string isn't obscene, I have no idea what it means!)