I made a simple benchmark to test HashSet.Add method and for me results looks strange.
For the benchmarking I use BenchmarkDotNet available on github. Here is a full code of a benchmark:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes;
using System.Collections.Generic;
public struct Point
{
public int X, Y;
}
[MemoryDiagnoser]
public class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run<Program>();
}
[Params(10, 500, 1000)]
public int ArrayLength { get; set; }
[GlobalSetup]
public void Setup()
{
hs = new HashSet<Point>(100); // Capacity is 100 to definitely have space for 1 element
p = new Point(); // Even point is struct I initialize it in Setup
hs.Add(p); // Do warm-up run to be sure at least 1 element was there
}
Point p;
HashSet<Point> hs;
[Benchmark]
public void Struct()
{
for (var i = 0; i < ArrayLength; i++)
{ // The test do the same operation multiple times
hs.Clear(); // Clear hashset so there will be 0 elements
hs.Add(p); // Add 1 element back
}
}
}
My expectation is that as HashSet already initialized with enough capacity AND I do this add only once before cleanup - there should be no additional allocations at all. But in reality here is a benchmark result executed with the command
dotnet run -c Release
Method | ArrayLength | Mean | Error | StdDev | Median | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|
Struct | 10 | 644.2 ns | 12.90 ns | 28.86 ns | 633.8 ns | 0.0572 | 240 B |
Struct | 500 | 30,080.1 ns | 151.63 ns | 134.42 ns | 30,096.3 ns | 2.8687 | 12,000 B |
Struct | 1000 | 64,610.0 ns | 1,264.29 ns | 1,352.78 ns | 64,436.2 ns | 5.7373 | 24,000 B |
The size of allocation bytes equal to number of iterations * 24. So each 'Add' do new allocation.
A few comments:
UPDATE
The problem is not in struct copy: I wrote another testcase to check that:
private Point t;
public void Test(Point p){
t = p;
}
[Benchmark]
public void Struct()
{
var test = new Point(1, 2);
Test(test);
}
And in result Allocations = 0
Ok, the problem is found:
HashSet is using GetHashCode method (what a surprise :) ). But this method is defined on Object level.
To run methods from Object, .net requires to do boxing for struct and this is the memory that is allocated during the execution. Final test that proves it:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes;
using System.Collections.Generic;
using Test;
public struct Point
{
public int X, Y;
}
[MemoryDiagnoser]
public class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run<Program>();
}
private Point p;
[GlobalSetup]
public void Setup()
{
p = new Point { X = 1, Y = 2 }; // Even point is struct I initialize it in Setup
}
[Benchmark]
public void Struct()
{
var hc = p.GetHashCode();
}
}
And the output:
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
Struct | 40.03 ns | 0.559 ns | 0.523 ns | 0.0057 | 24 B |
SOLUTION
To solve the issue GethashCode should be overriden for the struct.
Also do not forget to override bool Equals(object obj), and IEquatable as HashSet.Contains also use those methods.