Search code examples
c#c++.net-corepinvoke

.Net Core StringBuilder encoding on Mac


I have a following C function:

LIBRARY_API HRESULT LA_CC GetRSAKeyPair(STRTYPE privateKeyPtr, STRTYPE publicKeyPtr)
{
    int length = 2048;
    string privateKey, publicKey;
    if (GenerateRSAKeyPair(privateKey, publicKey) == false)
    {
        return FAIL;
    }
#ifndef _WIN32
        *privateKeyPtr = '\0'; // assumes `dest_size > 0`
        strncat(privateKeyPtr, privateKey.c_str(), length);
        *publicKeyPtr = '\0'; // assumes `dest_size > 0`
        strncat(publicKeyPtr, publicKey.c_str(), length);
#else
        *privateKeyPtr = L'\0'; // assumes `dest_size > 0`
        wcsncat(privateKeyPtr, toUTF16(privateKey).c_str(), length);
        *publicKeyPtr = L'\0'; // assumes `dest_size > 0`
        wcsncat(publicKeyPtr, toUTF16(publicKey).c_str(), length);
#endif
    return OK;
}

It is invoked using pinvoke:

[DllImport(DLL_FILE_NAME, CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
 public static extern int GetRSAKeyPair(StringBuilder privateKey, StringBuilder publicKey);

And when used:

StringBuilder privateKey = new StringBuilder(2048);
StringBuilder publicKey = new StringBuilder(2048);
LibraryNative.GetRSAKeyPair(privateKey, publicKey);
Console.WriteLine(privateKey);
Console.WriteLine(publicKey);

The output returns data in chinese, though same code works fine on Windows. The native library assumes encoding is not utf-16 on non-windows platform, but seems like stringbuilder expects utf-16 on nix too.


Solution

  • The explanation can be found in the conditional code in your C++. Here you opt to use an 8 bit encoding on platforms other than Windows, but use UTF16 on Windows. The C# code, on the other hand, explicitly states that the text is UTF16 encoding, through the use of CharSet.Unicode. .

    This mismatch explains the behaviour you observe. How to fix it? Well, you need to make sure that both sides of the interface, managed and unmanaged, use the same encoding.

    Your current strategy of a different encoding for different platforms seem to be asking for trouble to me. That will just lead to confusion. My advice is to pick a single encoding for interop. For instance UTF16 would be a sound choice that makes the interop simplest because the pinvoke marshaler understands it natively.