This is an extension of this question: fstream not opening files with accent marks in pathname
The problem is the following: a program opening a simple NTFS text file with accent marks in pathname (e.g. à, ò, ...). In my tests I'm using a file with pathname I:\università\foo.txt (università is the Italian translation of university)
The following is the test program:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
#include <errno.h>
#include <Windows.h>
using namespace std;
LPSTR cPath = "I:/università/foo.txt";
LPWSTR widecPath = L"I:/università/foo.txt";
string path("I:/università/foo.txt");
void tryWithStandardC();
void tryWithStandardCpp();
void tryWithWin32();
int main(int argc, char **argv) {
tryWithStandardC();
tryWithStandardCpp();
tryWithWin32();
return 0;
}
void tryWithStandardC() {
FILE *stream = fopen(cPath, "r");
if (stream) {
cout << "File opened with fopen!" << endl;
fclose(stream);
}
else {
cout << "fopen() failed: " << strerror(errno) << endl;
}
}
void tryWithStandardCpp() {
ifstream s;
s.exceptions(ifstream::failbit | ifstream::badbit | ifstream::eofbit);
try {
s.open(path.c_str(), ifstream::in);
cout << "File opened with c++ open()" << endl;
s.close();
}
catch (ifstream::failure f) {
cout << "Exception " << f.what() << endl;
}
}
void tryWithWin32() {
DWORD error;
HANDLE h = CreateFile(cPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (h == INVALID_HANDLE_VALUE) {
error = GetLastError();
cout << "CreateFile failed: error number " << error << endl;
}
else {
cout << "File opened with CreateFile!" << endl;
CloseHandle(h);
return;
}
HANDLE wideHandle = CreateFileW(widecPath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (wideHandle == INVALID_HANDLE_VALUE) {
error = GetLastError();
cout << "CreateFileW failed: error number " << error << endl;
}
else {
cout << "File opened with CreateFileW!" << endl;
CloseHandle(wideHandle);
}
}
The source file is saved with UTF-8 encoding. I'm using Windows 8.
This is the output of the program compiled with VC++ (Visual Studio 2012)
fopen() failed: No such file or directory
Exception ios_base::failbit set
CreateFile failed: error number 3
CreateFileW failed: error number 3
This is the output using MinGW g++
fopen() failed: No such file or directory
Exception basic_ios::clear
CreateFile failed: error number 3
File opened with CreateFileW!
So let's go to the questions:
I hope that opening a generic file with a generic pathname could be done without the necessity of platform-specific code, but I have not idea how to do it.
Thanks in advance.
You write:
“The source file is saved with UTF-8 encoding.”
Well that’s all well and good (so far) if you’re using the g++ compiler, which has UTF-8 as its default basic source character set. However, Visual C++ will by default assume that the source file is encoded in Windows ANSI, unless it's clearly otherwise. So make very sure that it has a BOM (Byte Order Mark) at the start, which – undocumented, as far as I know – causes Visual C++ to treat it as encoded with UTF-8.
You then ask,
“1. Why fopen() and std::ifstream works in a similar test in Linux but they don't in Windows?”
For Linux it's likely to work because (1) modern Linux is UTF-8 oriented, so if the filename looks the same it is likely the same as the identical looking UTF-8 encoded filename in the source code, and (2) in *nix a filename is just a sequence of bytes, not a sequence of characters. Which means that regardless of how it looks, if you pass the identical sequence of bytes, the same values, then you have a match, otherwise not.
In contrast, in Windows a filename is a sequence of characters that can be encoded in various ways.
In your case the UTF-8 encoded filename in the source code is stored as Windows ANSI in the executable (and yes, the result of building with Visual C++ depends on the selected ANSI codepage in Windows, which also as far as I know is undocumented). Then this gobbledegook string is passed down a routine hierarchy and converted to UTF-16, which is the standard character encoding in Windows. The result doesn't match the filename at all.
You further ask,
“2. Why CreateFileW() works only compiling with g++?”
Presumably because you did not include a BOM at the start of the sourc code file (see above).
With a BOM everything works nicely with Visual C++, at least in Windows 7:
File opened with fopen! File opened with c++ open() File opened with CreateFile!
Finally, you ask,
“3. Does exist a cross-platform alternative to CreateFile?”
Not really. There is Boost filesystem. But while its version 2 did have a workaround for the standard library's lossy narrow character based encoding, that workaround was removed in version 3, which just uses a Visual C++ extension of the standard library where Visual C++ implementation provides wide character argument versions of the stream constructors and open
. I.e., at least as far as I know (I haven't checked lately if things have been fixed), Boost filesystem only works in general with Visual C++, not with e.g. g++ – although it works for no-troublesome-characters filenames.
The workaround that v2 had, was to try with conversion to Windows ANSI (the codepage specified by the GetACP
function), and if that didn't work, try GetShortPathName
, which is practically guaranteed to be representable with Windows ANSI.
Part of the reason that the workaround in Boost filesystem was removed was, as I understand it, that it's in principle possible for the user to turn off the Windows short name functionality at least in Windows Vista and earlier. However that's not a practical concern. It just means that there is an easy fix available (namely turn it back on) if the user experiences problems due to having wilfully lobotomized the system.