Let's say I decided to open an existing file with the CreateFile
function. The content of the file is Hello world
. There is also a buffer (char array with size 11 and filled with zero bytes) that should contain the contents of the file.
And when I try to read the file with the ReadFile
function, certain garbage is written to the buffer. The debugger (I've tried GDB and LLDB) says that the contents of the buffer after reading is \377?H\000e\000l\000l\000o\000\000w\000o\000r\000l\000d\000\r\000\n\000\r\000 \n, '\000'
, and in a human-readable form, it looks like this ■ H
.
I've tried not filling the buffer with zeros. I tried to write (with WriteFile
) to a file first, then read. I also tried to change the value of how many bytes to read with ReadFile
. But it still doesn't change anything.
Also, GetLastError
returns ERROR_SUCCESS
.
Code:
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
HANDLE file = CreateFile("./test_file.txt", GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (file == INVALID_HANDLE_VALUE) {
puts("Failed to open");
return EXIT_FAILURE;
}
size_t length = strlen("Hello world"); /* 11 */
char buffer[12];
DWORD count = 0; /* Is always 11 (length of "Hello world") after reading */
memset(buffer, '\0', length + 1);
if (!ReadFile(file, buffer, (DWORD) length, &count, NULL)) {
puts("Failed to read or EOF reached.");
CloseHandle(file);
return EXIT_FAILURE;
}
printf("buffer: '%s'\n", buffer);
printf("count: %lu\n", count);
CloseHandle(file);
return EXIT_SUCCESS;
}
In the console, the output of the program looks like this:
buffer: ' ■ H'
count: 11
The text file itself is not written in a 7bit ASCII or 8bit UTF-8 byte encoding, like you are expecting. It is actually written in a UTF-16 byte encoding, with a BOM at the front of the file (bytes 0xFF 0xFE
for UTF-16LE). You are simply reading the file's raw bytes and displaying them as-is without any regard to their encoding.