When PInvoking the WindowsAPI CreateFile from a c# program what is the best practice: calling the generic CreateFile, ANSI CreateFileA, or the Unicode CreateFileW version?
Each of the API's has a different signature for the relevant CharSet:
// CreateFile generic
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
public static extern SafeFileHandle CreateFile (
[MarshalAs(UnmanagedType.LPTStr)] string lpFileName,
...
// CreateFileA ANSI
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern SafeFileHandle CreateFileA (
[MarshalAs(UnmanagedType.LPStr)] string lpFileName,
...
// CreateFileW Unicode
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
public static extern SafeFileHandle CreateFileW (
[MarshalAs(UnmanagedType.LPWStr)] string lpFileName,
...
According to Microsoft documentation1, for C# the default CharSet is Charset.ANSI. That seems really strange since strings in C# are Unicode. If the documentation is right, it means that CreateFile will ultimately call CreateFileA at runtime (with appropriate conversions to ANSI to and fro along the way).
Another Microsoft doc2 says, "When the CharSet is Unicode or the argument is explicitly marked as [MarshalAs(UnmanagedType.LPWSTR)] and the string is passed by value (not ref or out), the string will be pinned and used directly by native code (rather than copied)." This seems great for avoiding copying potentially large strings and providing max performance.
Assume that I want to call the CreateFile flavor that works optimally with C# strings, has best performance, minimal casting / translations, works on Windows x64 OS and secondarily has maximum portability.
Approach 1: Call generic CreateFile but change signature to CharSet.Unicode.
This may be a problem because CreateFile marshals the lpFileName as UnmanagedType.LPTStr whereas CreateFileW marshalls it as UnmanagedType.LPWStr. It seems like the marshaling would have to perform conversions? to get the right LP type (more than once). Another inefficiency is that CreateFile would have to call CreateFileW internally. Also, I want to make sure the "pinning" is happening for max performance and I'm not sure that would happen here.
Approach 2: Call generic CreateFile with signature CharSet.Auto This seems to provide maximum portability for target OS but will wind up calling CreateFileA internally which is inappropriate for C# strings (Unicode).
Approach 3: Call CreateFileW directly. This also seems less than optimal because if I am compiling for a different target OS like Win x86 (that uses only ANSI strings) than the program will not be able to run at all.
It seems like Approach 1 would be best but the MarshalAs LPTStr doesn't look right to me (considering that the CreateFileW version marshals as LPWStr).
I would appreciate any help you can give on this. I have been digging through dozens of conflicting webpages and can not find a definitive answer.
References:
1 DllImportAttribute.CharSet Field
Windows uses UTF-16 LE character encoding internally1. When you call the ANSI version of a Windows API, the system will convert the input to UTF-16 (using the calling thread's current code page), call into the Unicode version, and convert the output back to ANSI encoding. This is both needlessly costly as well as lossy: not every Unicode string can be represented using ANSI encoding. The conversion also imposes arbitrary size limitations on input and output buffers (CreateFileA limits the file name length to 260 ANSI code units).
With this in mind you will want to make sure to always call the Unicode version of the Windows API. This provides maximum performance on all supported versions of Windows, as well as guards against loss of information when converting from Unicode to ANSI. Whether you use CharSet.Auto
and MarshalAs(UnmanagedType.LPTStr)
or CharSet.Unicode
and MarshalAs(UnmanagedType.LPWStr)
amounts to the same2, and is a matter of personal preference. Microsoft recommends being explicit, i.e. explicitly name the Unicode version (CreateFileW
) and specify Unicode encoding as well as wide character string types (the 3rd option in your question).
1 With the exception of Windows 95/98/ME, collectively referred to as Win9x. None of them are officially supported.
2 CharSet.Auto
"chooses between ANSI and Unicode formats at run time, based on the target platform", so it isn't identical to CharSet.Unicdoe
in theory. However, all supported platforms use Unicode encoding in practice.