Search code examples
c++winapiwin32gui

How to fix garbled text with using ReadFile?


I have a Win32 application that I'm making. Use "ReadFile" to retrieve a text file that is written in Unicode. To be printed in the EditBox.

const TCHAR FILE_DIRECTORY[] = TEXT("data/");
const TCHAR FILE_LIST[][MAX_LOADSTRING] = { 
    TEXT("fputs_fgets.h"), TEXT("fprintf_fscanf.h"), 
    TEXT("fprintfs_fscanfs.h"), TEXT("fread_fwrite.h"), TEXT("freads_fwrite.h") };
const int FILE_NAME_LENGTH = _tcslen(FILE_LIST[idx]);
const int FILE_DIRECTORY_LENGTH = _tcslen(FILE_DIRECTORY);

TCHAR* filePath = (TCHAR*)calloc(FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, sizeof(TCHAR));
_tcscpy_s(filePath, FILE_DIRECTORY_LENGTH + 1, FILE_DIRECTORY);
_tcscat_s(filePath, FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, FILE_LIST[idx]);

HANDLE file = CreateFile(filePath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD fileSize = GetFileSize(file, NULL);
DWORD dwRead;

if (editText != NULL)
    free(editText);
editText = (TCHAR*)calloc(1, fileSize + 1);
ReadFile(file, editText, fileSize, &dwRead, NULL);
CloseHandle(file);
free(filePath);

However, there are some strange characters on the back of the output.

        printf("y좌표(정수): %d\n", point.y);
    }

    fclose(file);
}ﴀ﷽ý

How can i fix it? Thank you.


Solution

  • Assuming your file is UTF-16 and you are compiling with _UNICODE defined (assumptions justified by the fact that the rest of your text is read correctly), in this line:

    editText = (TCHAR*)calloc(1, fileSize + 1);
    

    you should actually do fileSize + sizeof(TCHAR) if you want to exploit the zeroing that calloc does to get a NUL-terminated string. As it is now, you have a wide string whose last character has only the low byte to zero, so the rest of your code goes on reading garbage until it happens to find two solid bytes of zero (adequately aligned).

    Mind you, I'm extremely dubious about this code in general - if you use TCHAR it means you want to compile both in ANSI (TCHAR == char) and in Unicode (TCHAR ==wchar_t), having this change how you interpret the bytes of external files is a disputable idea.