Why does adding an extra field to struct greatly improves its performance?

I noticed that a struct wrapping a single float is significantly slower than using a float directly, with approximately half of the performance.

using System;
using System.Diagnostics;

struct Vector1 {

    public float X;

    public Vector1(float x) {
        X = x;

    public static Vector1 operator +(Vector1 a, Vector1 b) {
        a.X = a.X + b.X;
        return a;

However, upon adding an additional 'extra' field, some magic seems to happen and performance once again becomes more reasonable:

struct Vector1Magic {

    public float X;
    private bool magic;

    public Vector1Magic(float x) {
        X = x;
        magic = true;

    public static Vector1Magic operator +(Vector1Magic a, Vector1Magic b) {
        a.X = a.X + b.X;
        return a;

The code I used to benchmark these is as follows:

class Program {
    static void Main(string[] args) {
        int iterationCount = 1000000000;
        var sw = new Stopwatch();
        var total = 0.0f;
        for (int i = 0; i < iterationCount; i++) {
            var v = (float) i;
            total = total + v;
        Console.WriteLine("Float time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
        Console.WriteLine("total = {0}", total);
        var totalV = new Vector1(0.0f);
        for (int i = 0; i < iterationCount; i++) {
            var v = new Vector1(i);
            totalV += v;
        Console.WriteLine("Vector1 time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
        Console.WriteLine("totalV = {0}", totalV);
        var totalVm = new Vector1Magic(0.0f);
        for (int i = 0; i < iterationCount; i++) {
            var vm = new Vector1Magic(i);
            totalVm += vm;
        Console.WriteLine("Vector1Magic time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
        Console.WriteLine("totalVm = {0}", totalVm);

With the benchmark results:

Float time was 00:00:02.2444910 for 1000000000 iterations.
Vector1 time was 00:00:04.4490656 for 1000000000 iterations.
Vector1Magic time was 00:00:02.2262701 for 1000000000 iterations.

Compiler/environment settings: OS: Windows 10 64 bit Toolchain: VS2017 Framework: .Net 4.6.2 Target: Any CPU Prefer 32 bit

If 64 bit is set as the target, our results are more predictable, but significantly worse than what we see with Vector1Magic on the 32 bit target:

Float time was 00:00:00.6800014 for 1000000000 iterations.
Vector1 time was 00:00:04.4572642 for 1000000000 iterations.
Vector1Magic time was 00:00:05.7806399 for 1000000000 iterations.

For the real wizards, I've included a dump of the IL here:

Further investigation indicates that this seems to be specific to the windows runtime, as the mono compiler produces the same IL.

On the mono runtime, both struct variants have roughly 2x slower performance compared to the raw float. This is quite a bit different to the performance we see on .Net.

What's going on here?

*Note this question originally included a flawed benchmark process (Thanks Max Payne for pointing this out), and has been updated to more accurately reflect the timings.


  • The jit has an optimization known as "struct promotion" where it can effectively replace a struct local or argument with multiple locals, one for each of the struct's fields.

    Struct promotion of a single struct-wrapped float however is disabled. The reasons are a bit obscure, but roughly:

    • structs that simply wrap primitive types are treated as integer values of the struct size when being passed to or returned from calls
    • during promotion analysis the jit can't tell if the struct is ever passed to or returned from a call.
    • the code sequences needed at calls to reclassify an int as a float (and vice versa) are thought to be expensive at runtime.
    • hence the struct is not promoted and so access and operations on the float field are a bit slower.

    So roughly speaking the jit is prioritizing reducing the costs at call sites over improving the costs at places where the field is used. And sometimes (as in your case above, where operation costs predominate) this is not the right call.

    As you have seen, if you make the struct larger then the rules for passing and returning the struct change (it is now passed returned by reference) and this unblocks promotion.

    In the CoreCLR sources you can see this logic at play in Compiler::lvaShouldPromoteStructVar.