Search code examples
cunicodemultibytewidechar

How to use ReadFile with multibyte codes


How to use ReadFile to read a buffer as wchar_t array and then output it to the console

DWORD read_output()
{
    BOOL success = FALSE;
    DWORD dwRead;
    HANDLE handle = CreateFileW(L"data.txt",
                                GENERIC_READ,
                                0,
                                NULL,
                                3,
                                FILE_ATTRIBUTE_NORMAL,
                                NULL);
    if (handle == INVALID_HANDLE_VALUE)
        printf("Failed to open file\n");
    do
    {
        wchar_t buffer[128];
        success = ReadFile(handle, buffer, 128, &dwRead, NULL);
        wprintf(L"%s", buffer);
    } while(!success || dwRead == 0);
    return 0;
}


int main()
{
    _setmode(fileno(stdout), _O_U16TEXT);
    read_output();
}

This is the kind of output I get

 ШиÑ
      Ñование.txt  اÙ

what I should get

 Шифрование.txt  العربية.txt

if I remove L"%s" I get this

퀠킨톸톄킀킾킲킰킽킸⺵硴⁴�������⺩硴ੴ

can someone explain in detail how to read multibytes characters using ReadFile


Solution

  • Your first output example indicates the file is encoded in UTF-8, so reading into wchar_t won't work. Need to read into char then use MultiByteToWideChar to convert from UTF-8 to wide characters. The characters read won't be null-terminated, so that needs to be added to the end of string before conversion.

    Saving your "what I should get" in data.txt as UTF-8, this works:

    #include <windows.h>
    #include <fcntl.h>
    #include <io.h>
    #include <stdio.h>
    
    DWORD read_output()
    {
        BOOL success = FALSE;
        DWORD dwRead;
        HANDLE handle = CreateFileW(L"data.txt", 
                                    GENERIC_READ, 
                                    0, 
                                    NULL, 
                                    3, 
                                    FILE_ATTRIBUTE_NORMAL, 
                                    NULL);
        if (handle == INVALID_HANDLE_VALUE)
            printf("Failed to open file\n");
    
        char buffer[128];
        wchar_t buffer2[128];
        success = ReadFile(handle, buffer, sizeof buffer, &dwRead, NULL);
        buffer[dwRead] = 0;
        MultiByteToWideChar(CP_UTF8, 0, buffer, dwRead, buffer2, _countof(buffer2));
        wprintf(L"%s\n", buffer2);
        return 0;
    }
    
    
    int main()
    {
        _setmode(_fileno(stdout), _O_U16TEXT);
        read_output();
    }
    

    Output to Windows command prompt as follows, but note the console font needs to contain glyphs for the characters to be displayed properly. A copy/paste to SO shows the correct ones.

    actual console display

    Шифрование.txt  العربية.txt