Search code examples
c++unicodeutf-8filenames

C++ and file paths with national symbols (encoded with utf8, maybe)


I have some cross-platform code, and it's supposed to use some config file. All works fine, except case when config file name path contains non ANSI chars.

For opening/reading file, I'm using std::ifstream. On windows platform (MSVC), solution is to use overloaded version of std::ifstream, which can accept path name as wchar_t*, so path name encoded as utf16, and no problem with national symbols in path.

But what solution for NIX* systems ? From my knowledge all such files names encoded with UTF-8, and it's ok to use char* as pointer to string. For example:

std::string path_name = ...; //assigning path name
std::ifstream fin(path_name.c_str());

But how about c_str() which return constant pointer to file name string, followed by null terminator ? Because UTF-8 bytes sequence can contain zeroes as part of code points, such string can be truncated.

So please direct me, where I'm wrong or please suggest some portable solution in case of I'm ok ))

Thank you.


Solution

  • UTF-8 does not contain zeroes as part of code units. The bytes in multi-byte sequences must have most significant bit set. So UTF-8 text can be zero-terminated like ASCII text.

    Therefore you can use path_name.c_str() as file name in UTF-8 encoding.