I saw this question today about some performance difference regarding ConcurrentDictionary
methods, and I saw it as a premature micro-optimization.
However, upon some thought, I realized (if I am not mistaken), that each time we pass a lambda to a method, CLR needs to allocate the memory, pass the appropriate closure (if needed), and then collect it some time later.
There are three possibilities:
Lambda without a closure:
// the lambda should internally compile to a static method,
// but will CLR instantiate a new ManagedDelegate wrapper or
// something like that?
return concurrent_dict.GetOrAdd(key, k => ValueFactory(k));
Lambda with a closure:
// this is definitely an allocation
return concurrent_dict.GetOrAdd(key, k => ValueFactory(k, stuff));
Outside check (like checking the condition before the lock):
// no lambdas in the hot path
if (!concurrent_dict.TryGetValue(key, out value))
return concurrent_dict.GetOrAdd(key, k => ValueFactory(k));
Third case is obviously allocation-free, the second one will need an allocation.
But is the first case (lambda which doesn't have a capture) completely allocation-free (at least in newer CLR versions)? Also, is this an implementation detail of the runtime, or something specified by the standard?
First of all the CLR does not know what a lambda is. This is a C# concept. It is compiled away. The C# language provides you a delegate value where you wrote the lambda.
C# does not guarantee that the delegate instance (or underlying method) is shared or not. In fact I believe the initialization of shared lambda delegates is thread unsafe and racy. So depending on timing you might see just one or multiple delegate instances.
So it's an implementation detail of the language.
In practice you can rely on forms 1 and 3 being shared. This is important for performance. If this ever was not the case I think it would be considered a high priority bug.