Search code examples
c#.netbenchmarkingbenchmarkdotnet

C# HashSet allocate memory on each Add even within capacity


I made a simple benchmark to test HashSet.Add method and for me results looks strange.

For the benchmarking I use BenchmarkDotNet available on github. Here is a full code of a benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes;
using System.Collections.Generic;

public struct Point
{
    public int X, Y;
}

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }

    [Params(10, 500, 1000)]
    public int ArrayLength { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        hs = new HashSet<Point>(100);  // Capacity is 100 to definitely have space for 1 element
        p = new Point();               // Even point is struct I initialize it in Setup
        hs.Add(p);                     // Do warm-up run to be sure at least 1 element was there
    }

    Point p;
    HashSet<Point> hs;

    [Benchmark]
    public void Struct()
    {
        for (var i = 0; i < ArrayLength; i++)
        {                              // The test do the same operation multiple times
            hs.Clear();                // Clear hashset so there will be 0 elements
            hs.Add(p);                 // Add 1 element back
        }
    }
}

My expectation is that as HashSet already initialized with enough capacity AND I do this add only once before cleanup - there should be no additional allocations at all. But in reality here is a benchmark result executed with the command

dotnet run -c Release
Method ArrayLength Mean Error StdDev Median Gen 0 Allocated
Struct 10 644.2 ns 12.90 ns 28.86 ns 633.8 ns 0.0572 240 B
Struct 500 30,080.1 ns 151.63 ns 134.42 ns 30,096.3 ns 2.8687 12,000 B
Struct 1000 64,610.0 ns 1,264.29 ns 1,352.78 ns 64,436.2 ns 5.7373 24,000 B

The size of allocation bytes equal to number of iterations * 24. So each 'Add' do new allocation.

A few comments:

  • I tried the same benchmark with just Clear - 0 allocations, so it is definitely added by Add
  • Changing the number of fields in struct DO change the total amount of allocated bytes. For example 7 int fields uses 48 bytes per operation.

UPDATE

The problem is not in struct copy: I wrote another testcase to check that:

private Point t;
public void Test(Point p){
    t = p;
}

[Benchmark]
public void Struct()
{
    var test = new Point(1, 2);
    Test(test);
}

And in result Allocations = 0


Solution

  • Ok, the problem is found:

    HashSet is using GetHashCode method (what a surprise :) ). But this method is defined on Object level.

    To run methods from Object, .net requires to do boxing for struct and this is the memory that is allocated during the execution. Final test that proves it:

    using BenchmarkDotNet.Attributes;
    using BenchmarkDotNet.Running;
    using BenchmarkDotNet.Attributes;
    using System.Collections.Generic;
    using Test;
    
    public struct Point
    {
        public int X, Y;
    }
    
    [MemoryDiagnoser]
    public class Program
    {
        public static void Main(string[] args)
        {
            BenchmarkRunner.Run<Program>();
        }
    
        private Point p;
    
        [GlobalSetup]
        public void Setup()
        {
            p = new Point { X = 1, Y = 2 };               // Even point is struct I initialize it in Setup
        }
    
        [Benchmark]
        public void Struct()
        {
            var hc = p.GetHashCode();
        }
    }
    

    And the output:

    Method Mean Error StdDev Gen 0 Allocated
    Struct 40.03 ns 0.559 ns 0.523 ns 0.0057 24 B

    SOLUTION

    To solve the issue GethashCode should be overriden for the struct.

    Also do not forget to override bool Equals(object obj), and IEquatable as HashSet.Contains also use those methods.