Search code examples
c#.netlistperformancebenchmarking

Add vs Addrange with LINQ / IEnumerable<T>


Back in the days, I was always told to use AddRange whenever possible because of the performance gain over Add. Today, I wanted to validate this statement prior to using it with the output of a List<T>.Select(x => ...) call but from the decompiled code of List<T>, I am under the impression the foreach {add} loop should be faster.

The main reasons I see are the numerous additional checks that are done in the process since it is not an ICollection

  • null-check on the new items collection
  • boundary check on the index (since AddRange is in fact a call to InsertRange with index _size
  • try type cast to ICollection
  • type check to know if it is an ICollection (which it isn't after the select call)
  • another boundary check when calling Insert
  • capacity check (this one also exists in Add)
  • position check (since Insert can also put data prior or within the actual list)

Has anyone ever done reliable benchmarking on this?


Edit: Added Code samples of the two options

var clientsConfirmations = new List<ClientConfirmation>();

foreach (var client in Clients)
{
    var clientConf = new ClientConfirmation(client.Id, client.Name,
        client.HasFlag ?? false);
    clientsConfirmations.Add(clientConf);
}

Versus

var clientsConfirmations = new List<ClientConfirmation>();

clientsConfirmations.AddRange(
    Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
    )
);

Solution

  • I did a small test application to try to bench the two versions of the code. I'm not an expert at benchmarking so if you find problems in it, make me know.

    I ran the bench (50k clients transformed 1k times 10 times so 50M iterations) and the results were

    Iteration First run Add (ms) AddRange (ms) Delta
    1 Add 25228 25385 157
    2 AddRange 21561 24682 3121
    3 Add 24182 25317 1135
    4 AddRange 25647 24749 - 898
    5 Add 23347 24508 1161
    6 AddRange 22699 24416 1717
    7 Add 25819 24491 -1328
    8 AddRange 19830 22113 2283
    9 Add 19376 19762 360
    10 AddRange 25119 25220 101
    Avg - 23280.8 24064.3 783.5

    I would say that in that specific context, Add is marginally faster than AddRange but still a clear winner and that unless it is performance critical code, it is a matter of preference / keeping the code semantically clean.


    Benchmark Code

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    
    namespace Benchmarks
    {
        class Program
        {
            private class Client
            {
                public int Id { get; set; }
                public string Name { get; set; }
                public bool? HasFlag { get; set; }
            }
    
            private class ClientConfirmation
            {
                public int Id { get; set; }
                public string Name { get; set; }
                public bool Flag { get; set; }
    
                public ClientConfirmation(int id, string name, bool flag)
                {
                    Id = id;
                    Name = name;
                    Flag = flag;
                }
            }
    
            static void Main(string[] args)
            {
                var rng = new Random();
                for (var i = 1; i <= 10; i+=2)
                {
                    AddVsAddRange(rng, i);
                }
                _ = Console.Read();
            }
    
            private static void AddVsAddRange(Random rng, int iteration)
            {
                var clients = new List<Client>();
                for (var i = 0; i < 50000; i++)
                {
                    clients.Add(new Client()
                    {
                        Id = rng.Next(),
                        Name = Guid.NewGuid().ToString(),
                        HasFlag = rng.Next(0, 3) switch
                        {
                            0 => false,
                            1 => true,
                            2 => null
                        }
                    });
                }
    
                Console.WriteLine($"| {iteration} | {Version1(clients)} | {Version2(clients)} |");
                var v2 = Version2(clients);
                var v1 = Version1(clients);
                Console.WriteLine($"| {iteration+1} | {v1} | {v2} |");
                clients.Clear();
            }
    
            private static long Version1(IEnumerable<Client> clients)
            {
                var sw = System.Diagnostics.Stopwatch.StartNew();
                var clientsConfirmations = new List<ClientConfirmation>();
    
                for (var i = 0; i < 1000; i++)
                    foreach (var client in clients)
                    {
                        bool flag = false;
    
                        if (client.HasFlag.HasValue)
                        {
                            flag = client.HasFlag.Value;
                        }
    
                        var clientConf = new ClientConfirmation(client.Id, client.Name, flag);
                        clientsConfirmations.Add(clientConf);
                    }
    
                var total =  sw.ElapsedMilliseconds;
                clientsConfirmations.Clear();
                return total;
            }
    
            private static long Version2(IEnumerable<Client> clients)
            {
                var sw = System.Diagnostics.Stopwatch.StartNew();
                var clientsConfirmations = new List<ClientConfirmation>();
    
                for (var i = 0; i < 1000; i++)
                    clientsConfirmations.AddRange(
                        clients.Select(client =>
                            new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
                        )
                    );
    
                var total = sw.ElapsedMilliseconds;
                clientsConfirmations.Clear();
                return total;
            }
        }
    }
    
    

    Edit: Create list with the right size

    As @PanagiotisKanavos mentionned, setting the list size at creation will prevent the code from doing multiple allocations to fit the data. I modified the bench to take it into account. var clientsConfirmations = new List<ClientConfirmation>(clients.Count()); Here are the results:

    Iteration First run Add (ms) AddRange (ms) Delta
    1 Add 24150 23104 -1046
    2 AddRange 24602 20391 -4211
    3 Add 25723 25510 - 213
    4 AddRange 23510 23041 - 469
    5 Add 22621 20287 -2334
    6 AddRange 21417 22995 1578
    7 Add 23371 25797 2426
    8 AddRange 24457 24695 238
    9 Add 24980 24686 - 294
    10 AddRange 25486 24038 -1448
    Avg Add 24031.7 23454.4 - 577.3

    In that case, the trend is reversed and AddRange gets the win but I can't figure out why...