Search code examples
cwindowswinapiunicodeutf-8

UTF-8 in Windows


How do I set the code page to UTF-8 in a C Windows program?

I have a third party library that uses fopen to open files. I can use wcstombs to convert my Unicode filenames to the current code page, however if the user has a filename with a character outside the code page then this breaks.

Ideally I would just call _setmbcp(65001) to set the code page to UTF-8, however the MSDN documentation for _setmbcp states that UTF-8 is not supported.

How can I get around this?


Solution

  • Unfortunately, there is no way to make Unicode the current codepage in Windows. The CP_UTF7 and CP_UTF8 constants are pseudo-codepages, used only in MultiByteToWideChar and WideCharToMultiByte conversion functions, like Ben mentioned.

    Your problem is similar to that of the fstream C++ classes. The fstream constructors accept only char* names, making impossible to open a file with a true Unicode name. The only solution offered by VC was a hack: open the file separately and then set the handle to the stream object. I'm afraid this isn't an option for you, of course, since the third party library probably doesn't accept handles.

    The only solution I can think of is to create a temporary file with a non-Unicode name, which is hard-linked to the original, and use that as a parameter.