Search code examples
c#.netclrunsafe

Why does using FieldOffset(0) in C# end up with different pointers for char array and string?


As a follow up on the good answer for string immutability (https://stackoverflow.com/a/37253663/6619353) I've started experimenting with this technique to understand the offset of the modifiable bytes.

Finally I've discovered that using [FieldOffset(0)] for two references fields does not make pointers have the same values.

Here is the test:

using System;
using System.Runtime.InteropServices;

namespace ConsoleApp
{
    [StructLayout(LayoutKind.Explicit)]
    public struct MutableString
    {
        [FieldOffset(0)] 
        public readonly string AsString;

        [FieldOffset(0)] 
        public readonly char[] AsCharArray;

        public MutableString(string original)
        {
            AsCharArray = null;
            AsString = original;
        }
    }

    public static class Program
    {
        public static unsafe void Main(string[] args)
        {
            var mutableString = new MutableString("test");

            fixed (char* pString = mutableString.AsString, pCharArray = mutableString.AsCharArray)
            {
                Console.WriteLine((long)pString);    // 2229380919860
                Console.WriteLine((long)pCharArray); // 2229380919864
            }
        }
    }
}

The code above prints different numbers (exact values will differ from time to time of course).

The difference is always 4 bytes (2 chars).

Here is the csproj file:

<Project Sdk="Microsoft.NET.Sdk">
    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFrameworks>netcoreapp2.0;net47</TargetFrameworks>
        <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
    </PropertyGroup>
</Project>

Behavior is the same for .NET Core dll and net47 exe, Debug/Release, x86/x64 build configuration.

Host machine is Win10 x64.

I'm wondering how it is possible that after assigning only AsString field I'm getting another value in the field with the same offset?


Solution

  • I'm wondering how it is possible that after assigning only AsString field I'm getting another value in the field with the same offset?

    You're not.

    If you compare the two field values with object.ReferenceEquals(mutableString.AsString, mutableString.AsCharArray), you'll find the two fields are equal, just as expected.

    What's tripping you up is the implicit conversion from the string and char[] types to the pointer. Both of those types are managed types, so the fixed statement has to pin the objects and return an appropriate pointer to the char data. It's this conversion that's going wrong, not the value actually stored in your struct.

    As for why the conversion goes wrong, it's due to padding differences in arrays between 64-bit and 32-bit processes. The difference appears between .NET Framework (desktop) and Core because the default project settings are different: desktop defaults to preferring 32-bit, while Core defaults to not preferring 32-bit (i.e. "Prefer 32-bit" is not checked — indeed, the "Prefer 32-bit" checkbox in VS doesn't appear to do anything for Core projects…I had to explicitly set the platform type to x86 to get a 32-bit process for Core).

    The implicit conversion from char[] to a pointer is expecting the extra 4 bytes of padding. But since your reference is not actually a reference to a char[] object, but instead is a reference to a string object, that padding isn't actually present and so the pointer winds up 4 bytes too far.

    Given that there's really no reason to expect a reference to a string object to be valid when reinterpreted as a reference to a char[] object — the object layouts are coincidentally compatible in a 32-bit process, but that's not something the .NET spec promises (it's an implementation detail) — I view this as "reasonable". If you want to create what is effectively a union data structure, you have to put into place your own safeguards to make sure that you only ever interpret the union-ed field as the one that you actually set.