Encountered interesting livelock situation that has to do with asynchrony.
Consider the code below that causes livelock and executes for 1 minute even though useful payload takes almost nothing to run. The reason for execution time to be around 1 minute is that we actually will hit thread pool grow limit (around 1 thread per second), so 300 iterations will make it run for around 5 minutes.
This is not trivial deadlock where we synchronously wait asynchronous operation in an environment with SyncronizationContext
allowing scheduling jobs on a single thread only (e.g. WPF, WebAPI, etc). The code bellow reproduces an issue on Console Application where there is no explicit SynchronizationContext
set and tasks are being scheduled on a thread pool.
I know that "solution" to this problem is "asynchrony all the way". In the real word we might not know that somewhere deep inside the developer of SyncMethod
suppresses asynchrony via waiting it in a blocking way unleashing such issues (even if he might did the trick with replacing SynchronizationContext
to make it not deadlock at least).
What are your suggestions to deal with such an issue when "asynchrony all the way" is not an option? Is there something else rather than obvious "do not spawn so many tasks at once"?
void Main()
{
List<Task> tasks = new List<Task>();
for (int i = 0; i < 60; i++)
tasks.Add(Task.Run(() => SyncMethod()));
bool exit = false;
Task.WhenAll(tasks.ToArray()).ContinueWith(t => exit = true);
while (!exit)
{
Print($"Thread count: {Process.GetCurrentProcess().Threads.Count}");
Thread.Sleep(1000);
}
}
void SyncMethod()
{
SomethingAsync().Wait();
}
async Task SomethingAsync()
{
await Task.Delay(1);
await Task.Delay(1); // extra puzzle -- why commenting one of these Delay will partially resolve the issue?
Print("async done");
}
void Print(object obj)
{
$"[{Thread.CurrentThread.ManagedThreadId}] {DateTime.Now} - {obj}".Dump();
}
Here is an output. Notice how all async continuations stuck for almost a minute and then all the sudden continued execution.
[12] 30.01.2018 23:34:36 - Thread count: 18 [12] 30.01.2018 23:34:37 - Thread count: 32 [12] 30.01.2018 23:34:38 - Thread count: 33 -- THREAD POOL STARTS TO GROW ... [12] 30.01.2018 23:35:18 - Thread count: 70 [12] 30.01.2018 23:35:19 - Thread count: 71 [12] 30.01.2018 23:35:20 - Thread count: 72 -- UNTIL ALL SCHEDULED TASKS CAN FIT [8] 30.01.2018 23:35:20 - async done -- ALMOST A MINUTE AFTER START [8] 30.01.2018 23:35:20 - async done -- THE CONTINUATIONS START GO THROUGH ... [61] 30.01.2018 23:35:20 - async done [10] 30.01.2018 23:35:20 - async done
Answering the original question:
What are your suggestions to deal with such an issue when "asynchrony all the way" is not an option? Is there something else rather than obvious "do not spawn so many tasks at once"?
By no means a solution for the root cause, but a quantitative remedy - we can adjust Thread Pool using SetMinThreads
increasing the amount of threads that will be created without a delay (so that way faster than regular "injection rate" which is on my setup 1 thread pool thread per second). The way it works in a given setup is simple. Basically we are wasting the Thread Pool threads until the pool grows big enough to start to execute the continuations. If we start with big enough pool we are basically eliminating the period of time where we just bound by the artificial "injection rate" which tries to keep amount of threads low (which makes sense, as thread pool is designed to run CPU-bound tasks instead of being blocked waiting asynchronous operation).
I should also leave a warning note.
By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.
There is also an interesting issue where Microsoft recommends increasing the "min threads" for ASP.NET as a performance/reliability improvement in some scenarios.
Interestingly, the problem described in the question is not purely imaginary. It is real. It happens with well-known and widely recognized software. Example from the experience -- Identity Server 3.
https://github.com/IdentityServer/IdentityServer3.EntityFramework/issues/101
The implementation that has this caveat (we had to rewrite it to work around the problem for our production scenario):
Another article that explains the issue in details.
As to the strange behavior for single Task.Delay
where some async invocations are completed with each new injected Thread Pool thread. It seems to be caused by continuation execution inlining along with the way Task.Delay
and Timer
are implemented. See this call stack, it shows that newly created Thread Pool thread is doing some additional magic for .NET Timers when it's created, before processing Thread Pool queue (see System.Threading.TimerQueue.AppDomainTimerCallback
).
at AsynchronySamples.StrangeTimer.Program.d__2.MoveNext() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.c__DisplayClass4_0.b__0() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke() at System.Runtime.CompilerServices.TaskAwaiter.c__DisplayClass11_0.b__0() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task`1.TrySetResult(TResult result) at System.Threading.Tasks.Task.DelayPromise.Complete() at System.Threading.Tasks.Task.c.b__274_1(Object state) at System.Threading.TimerQueueTimer.CallCallbackInContext(Object state) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.TimerQueueTimer.CallCallback() at System.Threading.TimerQueueTimer.Fire() at System.Threading.TimerQueue.FireNextTimers() at System.Threading.TimerQueue.AppDomainTimerCallback(Int32 id) [Native to Managed Transition] at kernel32.dll!74e86359() at kernel32.dll![Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] at ntdll.dll!77057b74() at ntdll.dll!77057b44()