Search code examples
c#character-encodingmarshallingsecurestring

C# convert SecureString to UTF-8 byte[] securely


I'm trying to get a SecureString into the form of a byte[] which I can keep GC pinned, encoded in UTF-8 format. I have been successful in doing this but with UTF-16 (the default encoding), but I can't figure out how to do the encoding conversion without the chance of the GC creating a managed copy of the data somewhere (the data needs to be kept secure).

Here's what I have so far (Context: An algorithm to calculate the hash of a SecureString)

public static byte[] Hash(this SecureString secureString, HashAlgorithm hashAlgorithm)
{
  IntPtr bstr = Marshal.SecureStringToBSTR(secureString);
  int length = Marshal.ReadInt32(bstr, -4);
  var utf16Bytes = new byte[length];
  GCHandle utf16BytesPin = GCHandle.Alloc(utf16Bytes, GCHandleType.Pinned);
  byte[] utf8Bytes = null;

  try
  {
    Marshal.Copy(bstr, utf16Bytes, 0, length);
    Marshal.ZeroFreeBSTR(bstr);
    // At this point I have the UTF-16 byte[] perfectly.
    // The next line works at converting the encoding, but it does nothing
    // to protect the data from being spread throughout memory.
    utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
    return hashAlgorithm.ComputeHash(utf8Bytes);
  }
  finally
  {
    if (utf8Bytes != null)
    {
      for (var i = 0; i < utf8Bytes.Length; i++)
      { 
        utf8Bytes[i] = 0;
      }
    }
    for (var i = 0; i < utf16Bytes.Length; i++)
    { 
      utf16Bytes[i] = 0;
    }
    utf16BytesPin.Free();
  }
}

What's the best way to do this conversion and am I trying to do it in the correct place as I have it or should I do it earlier somehow? Could this be more memory efficient by skipping the UTF-16 byte[] step entirely?


Solution

  • I've found a way to do this the way I wanted. The code I have here isn't finished (needs better exception handling and memory management in the case of failure), but here it is:

    [DllImport("kernel32.dll")]
    static extern void RtlZeroMemory(IntPtr dst, int length);
    
    public unsafe static byte[] HashNew(this SecureString secureString, HashAlgorithm hashAlgorithm)
    {
      IntPtr bstr = Marshal.SecureStringToBSTR(secureString);
      int maxUtf8BytesCount = Encoding.UTF8.GetMaxByteCount(secureString.Length);
      IntPtr utf8Buffer = Marshal.AllocHGlobal(maxUtf8BytesCount);
    
      // Here's the magic:
      char* utf16CharsPtr = (char*)bstr.ToPointer();
      byte* utf8BytesPtr  = (byte*)utf8Buffer.ToPointer();
      int utf8BytesCount = Encoding.UTF8.GetBytes(utf16CharsPtr, secureString.Length, utf8BytesPtr, maxUtf8BytesCount);
    
      Marshal.ZeroFreeBSTR(bstr);
      var utf8Bytes = new byte[utf8BytesCount];
      GCHandle utf8BytesPin = GCHandle.Alloc(utf8Bytes, GCHandleType.Pinned);
      Marshal.Copy(utf8Buffer, utf8Bytes, 0, utf8BytesCount);
      RtlZeroMemory(utf8Buffer, utf8BytesCount);
      Marshal.FreeHGlobal(utf8Buffer);
      try
      {
        return hashAlgorithm.ComputeHash(utf8Bytes);
      }
      finally
      {
        for (int i = 0; i < utf8Bytes.Length; i++)
        {
          utf8Bytes[i] = 0;
        }
        utf8BytesPin.Free();
      }
    }
    

    It relies on obtaining pointers to both the original UTF-16 string and a UTF-8 buffer, then using Encoding.UTF8.GetBytes(Char*, Int32, Byte*, Int32) to keep the conversion within unmanaged memory.