I have a 64 bit masm code that outputs to a console. The problem is that by using WriteConsoleW
, i'm not able to redirect the output of a command or anything since it only writes to the console buffer. But using WriteFile
adds spaces between each character since the 16 bit chars have the high-order bits zeroed out. How can i print Unicode text with WriteFile
?
I read here that I could use BOM but that just doesn't work for me (I added another WriteFile
call that writes the two bytes FF FE
before the second WriteFile
call, but it just printed a white rectangle and nothing else).
Here's the code:
extern GetStdHandle: proc
extern WriteConsoleW: proc
.data?
written dq ?
.data
string dw 0048h,0065h,006ch,006ch,006fh,0020h,0057h,006fh,0072h,006ch,0064h,0021h
len equ $-string
.code
main proc
push rbp
mov rbp, rsp
sub rsp, 020h
and rsp, -10h
mov rcx, -11
call GetStdHandle
mov rcx, rax
mov rdx, offset string
mov r8, len
mov r9, written
call WriteConsoleW
add rsp, 020h
mov rsp, rbp
pop rbp
ret
main endp
end
When i swap WriteConsoleW
to WriteFile
, it prints correctly when ran through visual studio, but when i run the generated exe
from a command line, instead of printing Hello World!
it prints H e l l o W o r l d !
Does anyone know how to deal with that ?
EDIT:
I'm not sure how to understand this but somehow when i use WriteFile
instead, the 16 bit characters get printed wrong only when i execute the program alone. However when i redirect the output to the echo
command it prints normally:
The same APIs in C++ produce the same console output. WriteConsoleW
performs a character translation to the console that WriteFile
doesn't.
WriteFile
just sends bytes to the console which interprets them in the current code page, which for me is 437 (OEM United States).
I was able to get it to work in C++ by calling SetConsoleOutputCP(65001)
(set console code page to UTF-8) and then writing a UTF-8 string. Note this list of code page identifiers which includes UTF-16 but it is only available for managed applications (e.g. C#).
I printed some non-ASCII to see if it came out correctly.
// compiled with MSVS "cl /W4 /utf-8 test.cpp"
// source saved in UTF-8 as well.
#include <windows.h>
int main() {
char s[] = u8"Hello, 马克"; // Note: need a chinese font, but cut/paste
// to Notepad and you'll see them if you don't.
SetConsoleOutputCP(65001);
auto h = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD written;
WriteFile(h, s, sizeof(s), &written, nullptr);
}
Output:
Hello, 马克
You should be able to adapt this to MASM easily.
If you are willing to use the C runtime library, then these APIs both work for UTF-16 if you set the console and file mode appropriately:
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
wchar_t s[] = L"Hello, 马克!";
_write(_fileno(stdout), s, sizeof(s));
int fd = _open("test.txt", _O_CREAT | _O_WRONLY | _O_U16TEXT);
_write(fd, s, sizeof(s));
_close(fd);
}
Output to console:
Hello, 马克!
Output to test.txt encoded in UTF-16LE. Note that 马克 is the two unicode code points U+9A5C and U+514B:
EDIT
Here's a demo of GetFileType
. If run it writes to the console correctly. If redirected to a file, e.g. "test > out.txt", the output file contains UTF-16LE-encoded data.
#include <windows.h>
int main()
{
auto h = GetStdHandle(STD_OUTPUT_HANDLE);
auto type = GetFileType(h);
WCHAR s[] = L"Only 20\u20AC!"; // U+20AC is EURO sign.
DWORD written;
if(type == FILE_TYPE_DISK)
WriteFile(h, s, sizeof(s) - sizeof(WCHAR) /* don't send the null */, &written, nullptr);
else
WriteConsoleW(h, s, sizeof(s) / sizeof(WCHAR) - 1, &written, nullptr);
}
Output to console:
Only 20€!
Output redirected to out.txt
: