Search code examples
c#c++structinteropmarshalling

Passing strings/arrays within structures between C++/C#


I am passing a struct from C# to C++.

C# code:

[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct Data
{
[MarshalAs(UnmanagedType.U4)]
public int number;

[MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
public int[] array;

[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 512)]
public string buffer;
}

C++ code:

struct Data
{
public:
    int number;
    int array[5];
    char buffer[512];
    //char *buffer;
};

The above method works fine. But Instead if I use pointers to handle data in C++ I am getting error as:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory

struct Data
{
public:
    int number;
    int *array;
    char *buffer;
};

Why cant I handle with pointers here? Is handling this case via pointers advantageous?


Solution

  • The problem is how your data represented in memory.

    Let's assume you have an instance of c# structure that marshals to unmanaged code or even file.

    [StructLayout(LayoutKind.Sequential, Pack = 8)]
    public struct Data
    {
    [MarshalAs(UnmanagedType.U4)]
    public int number = 5;
    
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
    public int[] array = {0, 1, 2, 3, 4};
    
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 512)]
    
    public string buffer = "Happy new Year";
    }
    

    According to this, your memory layout will be like this (in hex-like view):

    05 00 00 00 00 00 00 00
    01 00 00 00 02 00 00 00
    03 00 00 00 04 00 00 00
    00 48 00 61 00 70 00 70 
    00 79 00 20 00 6E 00 65 
    00 77 00 20 00 59 00 65 
    00 61 00 72
    

    Here we have first four bytes "05 00 00 00", which means number "5" in memory for your "number" variable. (Notice that these bytes in reversed order because Intel architecture is LittleEndian, see Endiannes for details)

    Then we have next five integers as "00 00 00 00" = 0, "01 00 00 00" = 1, "02 00 00 00" = 2, "03 00 00 00" = 3, "04 00 00 00" = 4 for array named "array".

    And the string "buffer" represents like this:

    "00 48" = H
    "00 61" = a
    "00 70" = p
    "00 70" = p
    "00 79" = y
    "00 20" = <space>
    "00 6E" = n
    "00 65" = e
    "00 77" = w
    "00 20" = <space>
    "00 59" = Y
    "00 65" = e
    "00 61" = a
    "00 72" = r
    

    There is some trick that .NET always use Unicode to store it's string variables. Every Unicode character has it's two-byte representation.

    Now, for this C++ struct

    struct Data
    {
    public:
        int number;
        int array[5];
        char buffer[512];
        //char *buffer;
    };
    

    sizeof(int) is 4. So the content of memory for variable "number" = "05 00 00 00" which is number five. array[0],array1,array[2],array[3],array[4] lay out on memory blocks "00 00 00 00" = 0, "01 00 00 00" = 1, "02 00 00 00" = 2, "03 00 00 00" = 3, "04 00 00 00" = 4. Everything else remains to buffer[512] variable. But in c++, sizeof(char) == 1. The char data type usually used to represent old ASCII style text with a single byte encoding. You should use wchar_t instead which is perfectly fits for Unicode encodings.

    Now let's take a look at

    struct Data
    {
    public:
        int number;
        int *array;
        char *buffer;
    };
    

    This structure will be projected on the same memory layout as described above. If you're running under 32-bit environment (win32) the content of "array" pointer will be "00 00 00 00" (4 bytes for pointer) and "buffer" pointer will be "01 00 00 00".

    If you're running under 64-bit environment (win64) the content of "array" pointer will be "00 00 00 00 01 00 00 00" (8 bytes for pointer) and buffer pointer will be "02 00 00 00 03 00 00 00".

    These are some kind of invalid pointers which point who knows where. That's why you get Access Violation when you try to dereference them.