I need to load an HTML template file (using std::ifstream
), add some content and then save it as a complete web page. It would be simple enough if not for polish characters - I've tried all combinations of char
/wchar_t
, Unicode
/Multi-Byte
character set, iso-8859-2
/utf-8
, ANSI
/utf-8
and none of them worked for me (always got some incorrectly displayed characters (or some of them not displayed at all).
I could paste a lot of code and files here but I'm not sure if that would even help. But maybe you could just tell me: what format/encoding should the template file have, what encoding should I declare in it for the web page and how should I load and save that file to get proper results?
(If my question is not specific enough or you do require code/file examples, let me know.)
Edit: I've tried the library suggested in the comment:
std::string fix_utf8_string(std::string const & str)
{
std::string temp;
utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
return str;
}
Call:
fix_utf8_string("wynik działania pozytywny ąśżźćńłóę");
Throws: utf8::not_enough_room
- what am I doing wrong?
Not sure if that's the (perfect) way to go but the following solution worked for me!
I saved my HTML template file as ANSI (or at least that's what Notepad++ says) and changed every write-to-file-stream-operation:
file << std::string("some text with polish chars: ąśżźćńłóę");
to:
file << ToUtf8("some text with polish chars: ąśżźćńłóę");
where:
std::string ToUtf8(std::string ansiText)
{
int ansiRequiredSize = MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), NULL, 0);
wchar_t * wideText = new wchar_t[ansiRequiredSize + 1];
wideText[ansiRequiredSize] = NULL;
MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), wideText, ansiRequiredSize);
int utf8RequiredSize = WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, NULL, 0, NULL, NULL);
char utf8Text[1024];
utf8Text[utf8RequiredSize] = NULL;
WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, utf8Text, utf8RequiredSize, NULL, NULL);
delete [] wideText;
return utf8Text;
}
The basic idea is to use MultiByteToWideChar()
and WideCharToMultiByte()
functions to convert the string from ANSI (multi byte) to wide char and then from wide char to utf-8 (more here: http://www.chilkatsoft.com/p/p_348.asp). Best part is - I didn't have to change anything else (i.e. std::ofstream
to std::wofstream
or using any 3rd party library or changing the way I actually use the file stream (instead of converting strings to utf-8 which is necessary))!
Probably should work for other languages too, although I did not test that.