Search code examples
assemblyunicodeconsolex86-64masm

Write to console Unicode (UTF-16) text with Windows WinAPI functions?


I have a 64 bit masm code that outputs to a console. The problem is that by using WriteConsoleW, i'm not able to redirect the output of a command or anything since it only writes to the console buffer. But using WriteFile adds spaces between each character since the 16 bit chars have the high-order bits zeroed out. How can i print Unicode text with WriteFile ?

I read here that I could use BOM but that just doesn't work for me (I added another WriteFile call that writes the two bytes FF FE before the second WriteFile call, but it just printed a white rectangle and nothing else).

Here's the code:

extern GetStdHandle: proc
extern WriteConsoleW: proc
.data?
    written dq ?
.data
    string dw 0048h,0065h,006ch,006ch,006fh,0020h,0057h,006fh,0072h,006ch,0064h,0021h
    len equ $-string
.code
main proc
    push    rbp
    mov rbp, rsp
    sub rsp, 020h
    and rsp, -10h

    mov rcx, -11
    call    GetStdHandle
    mov rcx, rax
    mov rdx, offset string
    mov r8, len
    mov r9, written
    call    WriteConsoleW

    add rsp, 020h
    mov rsp, rbp
    pop rbp
    ret
main endp
end

When i swap WriteConsoleW to WriteFile, it prints correctly when ran through visual studio, but when i run the generated exe from a command line, instead of printing Hello World! it prints H e l l o W o r l d !

Does anyone know how to deal with that ?

EDIT: I'm not sure how to understand this but somehow when i use WriteFile instead, the 16 bit characters get printed wrong only when i execute the program alone. However when i redirect the output to the echo command it prints normally: Powershell printing output


Solution

  • The same APIs in C++ produce the same console output. WriteConsoleW performs a character translation to the console that WriteFile doesn't. WriteFile just sends bytes to the console which interprets them in the current code page, which for me is 437 (OEM United States).

    I was able to get it to work in C++ by calling SetConsoleOutputCP(65001) (set console code page to UTF-8) and then writing a UTF-8 string. Note this list of code page identifiers which includes UTF-16 but it is only available for managed applications (e.g. C#).

    I printed some non-ASCII to see if it came out correctly.

    // compiled with MSVS "cl /W4 /utf-8 test.cpp"
    // source saved in UTF-8 as well.
    #include <windows.h>
    
    int main() {
        char s[] = u8"Hello, 马克"; // Note: need a chinese font, but cut/paste
                                   // to Notepad and you'll see them if you don't.
        SetConsoleOutputCP(65001);
        auto h = GetStdHandle(STD_OUTPUT_HANDLE);
        DWORD written;
        WriteFile(h, s, sizeof(s), &written, nullptr);
    }
    

    Output:

    Hello, 马克
    

    You should be able to adapt this to MASM easily.

    If you are willing to use the C runtime library, then these APIs both work for UTF-16 if you set the console and file mode appropriately:

    #include <stdio.h>
    #include <io.h>
    #include <fcntl.h>
    
    int main()
    {
        _setmode(_fileno(stdout), _O_U16TEXT);
        wchar_t s[] = L"Hello, 马克!";
        _write(_fileno(stdout), s, sizeof(s));
        int fd = _open("test.txt", _O_CREAT | _O_WRONLY | _O_U16TEXT);
        _write(fd, s, sizeof(s));
        _close(fd);
    }
    

    Output to console:

    Hello, 马克!
    

    Output to test.txt encoded in UTF-16LE. Note that 马克 is the two unicode code points U+9A5C and U+514B: hexadecimal dump of test.txt

    EDIT

    Here's a demo of GetFileType. If run it writes to the console correctly. If redirected to a file, e.g. "test > out.txt", the output file contains UTF-16LE-encoded data.

    #include <windows.h>
    
    int main()
    {
        auto h = GetStdHandle(STD_OUTPUT_HANDLE);
        auto type = GetFileType(h);
        
        WCHAR s[] = L"Only 20\u20AC!";  // U+20AC is EURO sign.
        DWORD written;
        
        if(type == FILE_TYPE_DISK)
            WriteFile(h, s, sizeof(s) - sizeof(WCHAR) /* don't send the null */, &written, nullptr);
        else
            WriteConsoleW(h, s, sizeof(s) / sizeof(WCHAR) - 1, &written, nullptr);
    }
    

    Output to console:

    Only 20€!
    

    Output redirected to out.txt: hexadecimal dump of out.txt