c#.net arrays performance initialization

Why is creating an array with inline initialization so slow?

Why is inline array initialization so much slower than doing so iteratively? I ran this program to compare them and the single initialization takes many times longer than doing so with a for loop.

Here's the program I wrote in LinqPad to test this.

var iterations = 100000000;
var length = 4;

{
    var timer = System.Diagnostics.Stopwatch.StartNew();

    for(int i = 0; i < iterations; i++){
        var arr = new int[] { 1, 2, 3, 4 };
    }
    timer.Stop();
    "Array- Single Init".Dump();
    timer.Elapsed.Dump();
}

{
    var timer = System.Diagnostics.Stopwatch.StartNew();

    for(int i = 0; i < iterations; i++){
        var arr = new int[length];
        for(int j = 0; j < length; j++){
            arr[j] = j;
        }
    }
    timer.Stop();
    "Array- Iterative".Dump();
    timer.Elapsed.Dump();
}

Results:

Array - Single Init
00:00:26.9590931

Array - Iterative
00:00:02.0345341

I also ran this on VS2013 Community Edition and the latest VS2015 preview on a different PC and got similar results to my LinqPad results.

I ran the code in Release mode (i.e.: compiler optimizations on), and got very different results from above. The two code blocks came out very similar this time. This seems to indicate it is a compiler optimization issue.

Array - Single Init
00:00:00.5511516

Array - Iterative
00:00:00.5882975

Solution

First of all, profiling at the C# level will give us nothing since it will show us the C# code line which takes longest to execute which is of course the inline array initialization, but for the sport:

Now when we see the expected results, lets Observe the code at the IL Level and try to see what is different between the initializations of the 2 arrays:

First of all we will look at the standard array initialization:

Everything looks good, the loop is doing exactly what we expect with no noticeable overhead.
Now let's take a look at the inline array initialization:
- The first 2 lines are creating an array at the size of 4.
- The third line duplicates the generated array's pointer onto the evaluation stack.
- The last line set's the array-local to the array that was just created.

Now we will focus on the 2 remaining lines:

The first line (L_001B) loads some Compilation-Time-Type whose type name is __StaticArrayInitTypeSize=16 and it's field name is 1456763F890A84558F99AFA687C36B9037697848 and it is inside a class named <PrivateImplementationDetails> in the Root Namespace. if we look at this field we see that it contains the desired array entirely just as we want it coded to bytes:

.field assembly static initonly valuetype <PrivateImplementationDetails>/__StaticArrayInitTypeSize=16 1456763F890A84558F99AFA687C36B9037697848 = ((01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00))

The second line, calls a method which returns the initialized array using the empty array that we have just created in L_0060 and using this Compile-Time-Type.

If we try to look at this method's code we will see that it is implemented within the CLR:

[MethodImpl(MethodImplOptions.InternalCall), SecuritySafeCritical, __DynamicallyInvokable]
public static extern void InitializeArray(Array array, RuntimeFieldHandle fldHandle);

So either we need to find it's source code in the published CLR sources, which I couldn't find for this method, or we can debug in the assembly level. Since I am having trouble with my Visual-Studio right now and having problems with it's assembly view, Let's try another attitude and look at the memory writes for each array initialization.

Starting from the loop initialization, at the beginning we can see there is en empty int[] initialized (in the picture 0x724a3c88 seen in Little-Endian is the type of int[] and 0x00000004 is the size of the array, than we can see 16 bytes of zeros).

When the array is initialized we can see that the memory is filled with the same type and size indicators, only it also has the numbers 0 to 3 in it:

When the loop iterates we can see that the next array (signed in red) it allocated right after our first array (not signed), which implies also that each array consumes 16 + type + size + padding = 19 bytes:

Doing the same process on the inline-type-initializer we can see that after the array is initialized, the heap contains other types also other than our array; this is probably from within the System.Runtime.CompilerServices.InitializeArray method since the array pointer and the compile-time-type token are loaded on the evaluation stack and not on the heap (lines L_001B and L_0020 in the IL code):

Now allocating the next array with the inline array initializer shows us that the next array is allocated only 64 bytes after the beginning of the first array!

So the inline-array-initializer is slower at the minimum because of few reasons:

Much more memory is allocated (unwanted memory from within the CLR).
There is a method call overhead in addition to the array constructor.
Also if the CLR allocated more memory other than the array - it probably does some more unnecessary actions.

Now for the difference between Debug and Release in the inline array initializer:

If you inspect the assembly code of the debug version it looks like that:

00952E46 B9 42 5D FF 71       mov         ecx,71FF5D42h  //The pointer to the array.
00952E4B BA 04 00 00 00       mov         edx,4  //The desired size of the array.
00952E50 E8 D7 03 F7 FF       call        008C322C  //Array constructor.
00952E55 89 45 90             mov         dword ptr [ebp-70h],eax  //The result array (here the memory is an empty array but arr cannot be viewed in the debug yet).
00952E58 B9 E4 0E D7 00       mov         ecx,0D70EE4h  //The token of the compilation-time-type.
00952E5D E8 43 EF FE 72       call        73941DA5  //First I thought that's the System.Runtime.CompilerServices.InitializeArray method but thats the part where the junk memory is added so i guess it's a part of the token loading process for the compilation-time-type.
00952E62 89 45 8C             mov         dword ptr [ebp-74h],eax
00952E65 8D 45 8C             lea         eax,[ebp-74h]  
00952E68 FF 30                push        dword ptr [eax]  
00952E6A 8B 4D 90             mov         ecx,dword ptr [ebp-70h]  
00952E6D E8 81 ED FE 72       call        73941BF3  //System.Runtime.CompilerServices.InitializeArray method.
00952E72 8B 45 90             mov         eax,dword ptr [ebp-70h]  //Here the result array is complete  
00952E75 89 45 B4             mov         dword ptr [ebp-4Ch],eax

On the other hand the code for the release version looks like that:

003A2DEF B9 42 5D FF 71       mov         ecx,71FF5D42h  //The pointer to the array.
003A2DF4 BA 04 00 00 00       mov         edx,4  //The desired size of the array.
003A2DF9 E8 2E 04 F6 FF       call        0030322C  //Array constructor.
003A2DFE 83 C0 08             add         eax,8  
003A2E01 8B F8                mov         edi,eax  
003A2E03 BE 5C 29 8C 00       mov         esi,8C295Ch  
003A2E08 F3 0F 7E 06          movq        xmm0,mmword ptr [esi]  
003A2E0C 66 0F D6 07          movq        mmword ptr [edi],xmm0  
003A2E10 F3 0F 7E 46 08       movq        xmm0,mmword ptr [esi+8]  
003A2E15 66 0F D6 47 08       movq        mmword ptr [edi+8],xmm0

The debug optimization makes it impossible to view the memory of arr, since the local at the IL level is never set. As you can see this version is using movq which is for that matter the fastest way to copy the memory of the compilation-time-type to the initialized array by copying 2 times a QWORD (2 ints together!) which is exacly the content of our array which is 16 bit.