Search code examples
c#.netconcurrencyconcurrent-collectionsimmutable-collections

When immutable collections are preferable than concurrent


Recently read about immutable collections. They are recommended to be used as a thread safe for reading, when the read operations are performed more often than write.

Then I want to test read performance ImmutableDictionary vs ConcurrentDictionary. Here is this very simple test (in .NET Core 2.1):

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;

namespace ImmutableSpeedTests
{
    class Program
    {
        public class ConcurrentVsImmutable
        {
            public int ValuesCount;
            public int ThreadsCount;

            private ImmutableDictionary<int, int> immutable = ImmutableDictionary<int, int>.Empty;
            private ConcurrentDictionary<int, int> concurrent = new ConcurrentDictionary<int, int>();

            public ConcurrentVsImmutable(int valuesCount, int threadsCount)
            {
                ValuesCount = valuesCount;
                ThreadsCount = threadsCount;
            }

            public void Setup()
            {
                // fill both collections. I don't measure time cause immutable is filling much slower obviously.
                for (var i = 0; i < ValuesCount; i++)
                {
                    concurrent[i] = i;
                    immutable = immutable.Add(i, i);
                }
            }

            public async Task<long> ImmutableSum() => await Sum(immutable);

            public async Task<long> ConcurrentSum() => await Sum(concurrent);

            private async Task<long> Sum(IReadOnlyDictionary<int, int> dic)
            {
                var tasks = new List<Task<long>>();

                // main job. Run multiple tasks to sum all values.
                for (var i = 0; i < ThreadsCount; i++)
                    tasks.Add(Task.Run(() =>
                    {
                        long x = 0;
                        foreach (var key in dic.Keys)
                        {
                            x += dic[key];
                        }
                        return x;
                    }));

                var result = await Task.WhenAll(tasks.ToArray());
                return result.Sum();
            }
        }

        static void Main(string[] args)
        {
            var test = new ConcurrentVsImmutable(1000000, 4);

            test.Setup();

            var sw = new Stopwatch();

            sw.Start();
            var result = test.ConcurrentSum().Result;
            sw.Stop();

            // Convince that the result of the work is the same
            Console.WriteLine($"Concurrent. Result: {result}. Elapsed: {sw.ElapsedTicks}.");

            sw.Reset();
            sw.Start();
            result = test.ImmutableSum().Result;
            sw.Stop();

            Console.WriteLine($" Immutable. Result: {result}. Elapsed: {sw.ElapsedTicks}.");

            Console.ReadLine();
        }
    }
}

You can run this code. Elapsed time in ticks will differ from time to time but the time spent by ConcurrentDictionary is several times less than by ImmutableDictionary.

This experiment makes me embarrassed. Did I do it wrong? What the reason to use immutable collections if we have concurrent? When they are preferable?


Solution

  • Immutable collections are not alternative to concurrent collections. And the way they are designed to reduce memory consumption, they are bound to be slower, the trade off here is to use less memory and thus by using less n operations to do anything.

    We usually copy collections to other collections to achieve immutability to persist state. Lets see what it means,

     var s1 = ImmutableStack<int>.Empty;
     var s2 = s1.Push(1);
     // s2 = [1]
    
     var s3 = s2.Push(2);
     // s2 = [1]
     // s3 = [1,2]
    
     // notice that s2 has only one item, it is not modified..
    
     var s4 = s3.Pop(ref var i);
    
     // s2 = [1];
     // still s2 has one item...
    

    Notice that, s2 always has only one item. Even if all items are removed.

    The way all data is stored internally is a huge tree and your collection is pointing to a branch which has descendants that represents initial state of the tree.

    I don't think the performance can be matched with concurrent collection where goals are totally different.

    In concurrent collection, you have a single copy of collection accessed by all threads.

    In immutable collection you have virtually isolated copy of a tree, navigating that tree is always costly.

    It is useful in transactional system, where if a transaction has to be rolled back, state of collection can be retained in commit points.