I need to perform character set conversion using iconv on windows. In this case this is transliteration to remove accents, etc. but the issue I am facing is the same for most any target encoding. Here is my Program:
#include "stdafx.h"
#include <vector>
#include <fstream>
#include <iconv.h>
#include <iostream>
int _tmain(int argc, _TCHAR* argv[])
{
std::ifstream ifs("test.txt", std::ios::binary | std::ios::ate);
std::ifstream::pos_type pos = ifs.tellg();
char * pIn = new char[(int)pos + 1];
ifs.seekg(0, std::ios::beg);
ifs.read(pIn, pos);
pIn[pos] = 0;
size_t srclen = strlen(pIn);
char dst[1000];
char * pOut = (char*)dst;
size_t dstlen = 1000;
iconv_t conv = iconv_open("UTF-8", "ASCII//TRANSLIT");
std::cout << srclen << " " << dstlen << std::endl;
auto ret = iconv(conv, (const char**)&pIn, &srclen, &pOut, &dstlen);
std::cout << (int)ret << " " << errno << " " << srclen << " " << dstlen << std::endl;
iconv_close(conv);
return 0;
}
The test.txt file looks like this (UTF-8 w/o BOM):
qwe
Tøyenbekken
Zażółć gęślą jaźń
ZAŻÓŁĆ GĘŚLĄ JAŹŃ
Unfortunately the iconv call stops processing at the first non ASCII character and program outputs:
75 1000
-1 0 69 994
The return value of -1 indicates error, but errno is set to 0 which gives no clue as to what may be wrong.
Any idea what am I doing wrong here? To make the matter more interesting here is the output of iconv.exe located in the same dir as the libiconv2.dll file:
> iconv -f utf-8 -t ascii//translit test.txt
qwe
Toyenbekken
Zaz'ol'c ge'sla ja'z'n
ZAZ'OL'C GE'SLA JA'Z'N
which is ok.
Update after testing on Linux: The command line version of iconv does not work - it outputs some garbage to the console (in place of non ascii characters). Using my own code it outputs error code of 84 (which is I guess EILSEQ - Illegal byte sequence) after processing ascii characters.
Any ideas what may be wrong here?
The issue was that I wanted to convert from UTF-8 to ASCII and opened the converter this way:
iconv_t conv = iconv_open("UTF-8", "ASCII//TRANSLIT");
whereas it should be done this way:
iconv_t conv = iconv_open("ASCII//TRANSLIT", "UTF-8");
(argument order). Still not sure why I did not get proper error code.