Search code examples
linuxwindowscharacter-encodingiconvlibiconv

libiconv on windows/linux issue


I need to perform character set conversion using iconv on windows. In this case this is transliteration to remove accents, etc. but the issue I am facing is the same for most any target encoding. Here is my Program:

#include "stdafx.h"
#include <vector>
#include <fstream>
#include <iconv.h>
#include <iostream>

int _tmain(int argc, _TCHAR* argv[])
{
    std::ifstream ifs("test.txt", std::ios::binary | std::ios::ate);
    std::ifstream::pos_type pos = ifs.tellg();
    char * pIn = new char[(int)pos + 1];
    ifs.seekg(0, std::ios::beg);
    ifs.read(pIn, pos);
    pIn[pos] = 0;
    size_t srclen = strlen(pIn);

    char dst[1000];
    char * pOut = (char*)dst;
    size_t dstlen = 1000;

    iconv_t conv = iconv_open("UTF-8", "ASCII//TRANSLIT");
    std::cout << srclen << " " << dstlen << std::endl;
    auto ret = iconv(conv, (const char**)&pIn, &srclen, &pOut, &dstlen);
    std::cout << (int)ret << " " << errno << " " << srclen << " " << dstlen << std::endl;
    iconv_close(conv);

    return 0;
}

The test.txt file looks like this (UTF-8 w/o BOM):

qwe
Tøyenbekken
Zażółć gęślą jaźń
ZAŻÓŁĆ GĘŚLĄ JAŹŃ

Unfortunately the iconv call stops processing at the first non ASCII character and program outputs:

75 1000
-1 0 69 994

The return value of -1 indicates error, but errno is set to 0 which gives no clue as to what may be wrong.

Any idea what am I doing wrong here? To make the matter more interesting here is the output of iconv.exe located in the same dir as the libiconv2.dll file:

> iconv -f utf-8 -t ascii//translit test.txt
qwe
Toyenbekken
Zaz'ol'c ge'sla ja'z'n
ZAZ'OL'C GE'SLA JA'Z'N

which is ok.

Update after testing on Linux: The command line version of iconv does not work - it outputs some garbage to the console (in place of non ascii characters). Using my own code it outputs error code of 84 (which is I guess EILSEQ - Illegal byte sequence) after processing ascii characters.

Any ideas what may be wrong here?


Solution

  • The issue was that I wanted to convert from UTF-8 to ASCII and opened the converter this way:

    iconv_t conv = iconv_open("UTF-8", "ASCII//TRANSLIT");
    

    whereas it should be done this way:

    iconv_t conv = iconv_open("ASCII//TRANSLIT", "UTF-8");
    

    (argument order). Still not sure why I did not get proper error code.