Search code examples
visual-c++utf-16wofstream

wofstream writes unnecessary byte


I am experimenting with with file output in utf-16 using wofstream, successfully so far. But I have got a problem to write a new line. As I found out with the Notepad and a hex editor, a new line on windows corresponds to 2 symbols: LineFedd and CarrigeReturn (0x000A and 0x000D). Trying to repriduce this programmatically led to weird result.

#include <fstream>
#include <codecvt>
#include <locale>
#define ENDL L"\u000a\u000d"
using namespace std;
int main()
{
locale utf16(locale(), new codecvt_utf16<wchar_t, 0x10ffffUL, little_endian>());//for writing UTF-16
wofstream fout(L"text.txt");
fout.imbue(utf16);
const unsigned short BOM= 0xFEFF;
fout.write((wchar_t*)&BOM, 1);
fout<<L"some text"<<ENDL<<L"more text";
fout.close();
}

the text that follows ENDL is totally messed up. I found the cause with a hex editor. for ENDL it writes 0D 0A 00 0D 00 . That is, for some reason it writes unnecessary and outrignt harmful 0D byte before the Linefeed character that causes all following bytes to shift to the right and thus messes up the utf-16 encoding.

I don't understand why this happens and how can I fix it


Solution

  • try open your file in binary mode:

    std::wofstream fout(L"text", std::ios_base::binary);
    

    I don't have experience with Windows systems but it seems the OS is unhelpfully replacing newlunes with end of line sequences.

    Also, I would first imbue() the modified locale and the open() the file: once a character is read, calling imbue() has either no effect or undefined behavior (don't recall which off-hand). I think there is nothing preventing the stream from reading the first buffer upon open(). Idon't think that's your actual problem, though.