I suspect I am falling prey to https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.standardoutput?view=net-9.0#remarks, specifically the case when the parent process reads both stdout and stderr.
I cannot reproduce it, but a coworker of mine can, which makes it somewhat harder to investigate.
The deadlock occurs inside process.StandardOutput.ReadLineAsync
in the method below.
public async Task<ProcessRunnerResult> Run(string executablePath,
IProcessRunner.Mode mode,
string? args = null,
string? workingDirectory = null,
Action<ProcessStartInfo>? beforeInvoke = null,
Action<string>? notifyStdOutLine = null,
CancellationToken cancellationToken = default)
{
var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = executablePath,
Arguments = args,
RedirectStandardOutput = true,
RedirectStandardError = true,
CreateNoWindow = true,
UseShellExecute = false,
WorkingDirectory = workingDirectory
}
};
string? stdErr = null;
try
{
beforeInvoke?.Invoke(process.StartInfo);
process.Start();
string? line;
while ((line = await process.StandardOutput.ReadLineAsync(cancellationToken)) != null)
{
notifyStdOutLine?.Invoke(line);
}
stdErr = await process.StandardError.ReadToEndAsync(cancellationToken);
}
catch (Exception ex)
{
if (mode == IProcessRunner.Mode.NeverThrow)
{
return new ProcessRunnerResult(-1, null, ExceptionDispatchInfo.Capture(ex));
}
throw new ProcessRunnerException(executablePath, args, -1, ex);
}
if (mode == IProcessRunner.Mode.RethrowAndIfNonZeroExitCode && process.ExitCode != 0)
{
throw new ProcessRunnerException(executablePath, args, process.ExitCode);
}
return new ProcessRunnerResult(process.ExitCode, stdErr);
}
It is as if while my process awaits to get stdout, the child process overflows the stderr buffer and waits for my process to clear it and hence both of us deadlock.
My question is about how can we troubleshoot such a deadlock using WinDBG and full memory dump created with procdump -ma
?
The two most common commands I have encountered while searching the web are useless for me:
!sos.syncblk
does not show anything useful0:010> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
-----------------------------
Total 25
CCW 0
RCW 0
ComClassFactory 0
Free 1
!dlk
commandI ran !clrstack -all
and got this:
0:010> !clrstack -all
OS Thread Id: 0x11978
Child SP IP Call Site
00000040CDD7E3B8 00007ffb771b0d24 [HelperMethodFrame_1OBJ: 00000040cdd7e3b8] System.Threading.Monitor.ObjWait(Int32, System.Object)
00000040CDD7E4E0 00007ffadca6d83e System.Threading.Monitor.Wait(System.Object, Int32) [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs @ 156]
00000040CDD7E510 00007ffadca788ee System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ManualResetEventSlim.cs @ 561]
00000040CDD7E5B0 00007ffadca91a09 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 3072]
00000040CDD7E630 00007ffadca917f6 System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 3007]
00000040CDD7E6B0 00007ffadcae3815 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task, System.Threading.Tasks.ConfigureAwaitOptions) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs @ 104]
00000040CDD7E6F0 00007ffadcbec570 System.Runtime.CompilerServices.TaskAwaiter`1[[System.Int32, System.Private.CoreLib]].GetResult() [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs @ 335]
00000040CDD7E720 00007ffa7e62718a Aida.Cli.Program.(System.String[])
OS Thread Id: 0x8388
Child SP IP Call Site
00000040CE7FF670 00007ffb771b0d24 [DebuggerU2MCatchHandlerFrame: 00000040ce7ff670]
OS Thread Id: 0x6064
Child SP IP Call Site
00000040CEC7F2E8 00007ffb771b0d24 [HelperMethodFrame: 00000040cec7f2e8] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
00000040CEC7F3F0 00007ffa7fe835f1 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 131]
00000040CEC7F450 00007ffadcb7e8ee System.Diagnostics.Tracing.CounterGroup.PollForValues() [/_/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/CounterGroup.cs @ 351]
00000040CEC7F730 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cec7f730]
OS Thread Id: 0xf174
Child SP IP Call Site
00000040CD94F228 00007ffb771b0d24 [HelperMethodFrame: 00000040cd94f228] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
00000040CD94F330 00007ffa7fe835f1 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 131]
00000040CD94F390 00007ffadca89486 System.Threading.PortableThreadPool+GateThread.GateThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.GateThread.cs @ 48]
00000040CD94F700 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cd94f700]
OS Thread Id: 0x3758
Child SP IP Call Site
00000040CD98F960 00007ffb771b3164 [InlinedCallFrame: 00000040cd98f960]
00000040CD98F960 00007ffadca8b2c0 [InlinedCallFrame: 00000040cd98f960]
00000040CD98F920 00007ffadca8b2c0 System.Threading.PortableThreadPool+IOCompletionPoller.Poll() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs @ 188]
00000040CD98FC60 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cd98fc60]
OS Thread Id: 0x147f0
Child SP IP Call Site
00000040CD9CF4D0 00007ffb771b3164 [InlinedCallFrame: 00000040cd9cf4d0]
00000040CD9CF4D0 00007ffadca8b2c0 [InlinedCallFrame: 00000040cd9cf4d0]
00000040CD9CF490 00007ffadca8b2c0 System.Threading.PortableThreadPool+IOCompletionPoller.Poll() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs @ 188]
00000040CD9CF7D0 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cd9cf7d0]
OS Thread Id: 0x11f9c
Child SP IP Call Site
OS Thread Id: 0x8084
Child SP IP Call Site
00000040CF3FF368 00007ffb771b0d24 [HelperMethodFrame_1OBJ: 00000040cf3ff368] System.Threading.Monitor.ObjWait(Int32, System.Object)
00000040CF3FF490 00007ffa7fa0657e System.Threading.Monitor.Wait(System.Object, Int32) [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs @ 156]
00000040CF3FF4C0 00007ffa7e67a7ef Microsoft.Extensions.Logging.Console.ConsoleLoggerProcessor.TryDequeue(Microsoft.Extensions.Logging.Console.LogMessageEntry ByRef) [/_/src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleLoggerProcessor.cs @ 165]
00000040CF3FF550 00007ffa7e67a70d Microsoft.Extensions.Logging.Console.ConsoleLoggerProcessor.ProcessLogQueue() [/_/src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleLoggerProcessor.cs @ 104]
00000040CF3FF820 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cf3ff820]
OS Thread Id: 0x1920
Child SP IP Call Site
00000040CF57F1C8 00007ffb771b0d24 [HelperMethodFrame: 00000040cf57f1c8] System.Threading.WaitHandle.WaitMultipleIgnoringSyncContext(IntPtr*, Int32, Boolean, Int32)
00000040CF57F2F0 00007ffadca715c9 System.Threading.WaitHandle.WaitMultiple(System.ReadOnlySpan`1<System.Threading.WaitHandle>, Boolean, Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 269]
00000040CF57F3C0 00007ffadca713be System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle[], Boolean, Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 236]
00000040CF57F400 00007ffadc402e29 Microsoft.Build.BackEnd.Logging.LoggingService.g__LoggingEventProc|141_0() [/_/src/Build/BackEnd/Components/Logging/LoggingService.cs @ 1351]
00000040CF57F680 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cf57f680]
OS Thread Id: 0x144f4
Child SP IP Call Site
00000040CF87F4E8 00007ffb771b0294 [InlinedCallFrame: 00000040cf87f4e8] Interop+Kernel32.g____PInvoke|199_0(IntPtr, Byte*, Int32, Int32*, System.Threading.NativeOverlapped*)
00000040CF87F4E8 00007ffa8011ca76 [InlinedCallFrame: 00000040cf87f4e8] Interop+Kernel32.g____PInvoke|199_0(IntPtr, Byte*, Int32, Int32*, System.Threading.NativeOverlapped*)
00000040CF87F4B0 00007ffa8011ca76 Interop+Kernel32.ReadFile(System.Runtime.InteropServices.SafeHandle, Byte*, Int32, Int32 ByRef, System.Threading.NativeOverlapped*) [/_/src/coreclr/System.Private.CoreLib/Microsoft.Interop.LibraryImportGenerator/Microsoft.Interop.LibraryImportGenerator/LibraryImports.g.cs @ 6838]
00000040CF87F590 00007ffa8011c8d0 System.IO.RandomAccess.ReadAtOffset(Microsoft.Win32.SafeHandles.SafeFileHandle, System.Span`1, Int64) [/_/src/libraries/System.Private.CoreLib/src/System/IO/RandomAccess.Windows.cs @ 52]
00000040CF87F640 00007ffa8011c6a3 System.IO.RandomAccess+c.b__19_0(System.ValueTuple`4<Microsoft.Win32.SafeHandles.SafeFileHandle,System.Memory`1,Int64,System.IO.Strategies.OSFileStreamStrategy>) [/_/src/libraries/System.Private.CoreLib/src/System/IO/RandomAccess.Windows.cs @ 257]
00000040CF87F6B0 00007ffa8011ffd3 System.Threading.AsyncOverSyncWithIoCancellation+d__8`2[[System.ValueTuple`4[[System.__Canon, System.Private.CoreLib],[System.Memory`1[[System.Byte, System.Private.CoreLib]], System.Private.CoreLib],[System.Int64, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]], System.Private.CoreLib],[System.Int32, System.Private.CoreLib]].MoveNext() [/_/src/libraries/Common/src/System/Threading/AsyncOverSyncWithIoCancellation.cs @ 135]
00000040CF87F760 00007ffa7fa158b0 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179]
00000040CF87F7D0 00007ffa7fa190fa System.Runtime.CompilerServices.TaskAwaiter+c.b__12_0(System.Action, System.Threading.Tasks.Task) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs @ 273]
00000040CF87F920 00007ffa801217b5 System.Threading.Tasks.AwaitTaskContinuation.System.Threading.IThreadPoolWorkItem.Execute() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs @ 647]
00000040CF87FA40 00007ffa7fa1c183 System.Threading.ThreadPoolWorkQueue.Dispatch() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs @ 994]
00000040CF87FAC0 00007ffadca8c723 System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.NonBrowser.cs @ 102]
00000040CF87FDF0 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cf87fdf0]
OS Thread Id: 0xa890
Child SP IP Call Site
00000040CFA3F558 00007ffb771b0d24 [HelperMethodFrame: 00000040cfa3f558] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
00000040CFA3F660 00007ffa7fe835f1 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 131]
00000040CFA3F6C0 00007ffadca8230f System.Threading.TimerQueue.TimerThread() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/TimerQueue.Portable.cs @ 87]
00000040CFA3F980 00007ffade16dd63 [DebuggerU2MCatchHandlerFrame: 00000040cfa3f980]
OS Thread Id: 0x38cc
Child SP IP Call Site
OS Thread Id: 0x133f8
Child SP IP Call Site
OS Thread Id: 0x3ac4
Child SP IP Call Site
The async call stack looks like this:
0:010> !dumpasync
STACK 1
0000022fc5458ee0 00007ffa7e946a40 ( ) System.Threading.Tasks.ValueTask<System.Int32>+ValueTaskSourceAsTask
0000022fc5458f88 00007ffa7e947e78 (0) Microsoft.Extensions.HotReload.PayloadTypeReaderWriter+<ReadNextPayloadTypeAsync>d__0 @ 7ffa7e628ce0
0000022fc545b200 00007ffa7e94d268 (0) StartupHook+<ReceiveDeltas>d__4 @ 7ffa7e6284e0
0000022fc545b328 00007ffa7e971378 (1) StartupHook+<>c__DisplayClass1_0+<<Initialize>b__0>d @ 7ffa7e627420
0000022fc54596e0 00007ffa7e9249b8 ( ) System.Threading.Tasks.UnwrapPromise<System.Threading.Tasks.VoidTaskResult>
STACK 2
<< Awaiting: 0000022fc856bb78 00007ffa7f6f6ce8 System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable<System.Int32>+ConfiguredValueTaskAwaiter >>
0000022fc856baf0 00007ffa7f9290b0 (1) System.IO.Strategies.BufferedFileStreamStrategy+<ReadFromNonSeekableAsync>d__36 @ 7ffa8011f780
0000022fc856bc10 00007ffa7f6fe9f0 (1) System.IO.StreamReader+<ReadBufferAsync>d__72 @ 7ffa8011aa80
0000022fc856bdb0 00007ffa7f929590 (0) System.IO.StreamReader+<ReadLineAsyncInternal>d__63 @ 7ffa801201c0
0000022fc86154f0 00007ffa7f9299e0 (0) Aida.TestGen.CSharp.Tools.ProcessRunner+<Run>d__0 @ 7ffa7f6dbfc0
0000022fc8615638 00007ffa7fbc6ca8 (0) Aida.TestGen.CSharp.Tools.TestCodeThruDotNet+<Test>d__4 @ 7ffa7fa13b60
0000022fc8606ef0 00007ffa80209578 (2) Aida.Cli.UnitTestEngine+<BuildAndTest>d__33 @ 7ffa8010b910
0000022fc85e3da0 00007ffa800e28b0 (2) Aida.Cli.UnitTestEngine+<RunTest>d__32 @ 7ffa800f09f0
0000022fc85b5980 00007ffa800b8780 (1) Aida.Cli.UnitTestEngine+<Generate>d__18 @ 7ffa7fe92980
0000022fc5e47940 00007ffa7f7215b8 (4) Aida.Cli.UnitTestEngine+<<Run>g__InternalRun|14_0>d @ 7ffa7ee956a0
0000022fc5e47a38 00007ffa7f721978 (0) Aida.Cli.Program+<>c__DisplayClass2_0+<<Main>b__9>d @ 7ffa7e663ef0
0000022fc5e47b48 00007ffa7f721d00 (1) System.CommandLine.Invocation.AnonymousCommandHandler+<InvokeAsync>d__6 @ 7ffa7e663a00
0000022fc5e47c68 00007ffa7f722100 (0) System.CommandLine.Invocation.InvocationPipeline+<>c__DisplayClass4_0+<<BuildInvocationChain>b__0>d @ 7ffa7e6635d0
0000022fc5e47d88 00007ffa7f7273c8 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c__DisplayClass17_0+<<UseParseErrorReporting>b__0>d @ 7ffa7e662d80
0000022fc5e47ec0 00007ffa7f729488 (0) Aida.Cli.Program+<>c__DisplayClass2_0+<<Main>b__8>d @ 7ffa7e6476c0
0000022fc5e47fb8 00007ffa7f729848 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c__DisplayClass12_0+<<UseHelp>b__0>d @ 7ffa7e646eb0
0000022fc5e4aee0 00007ffa7f72c328 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c__DisplayClass22_0+<<UseVersionOption>b__0>d @ 7ffa7e6468f0
0000022fc5e4b018 00007ffa7f72e150 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c__DisplayClass19_0+<<UseTypoCorrections>b__0>d @ 7ffa7e646240
0000022fc5e4b138 00007ffa7f754bf8 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c+<<UseSuggestDirective>b__18_0>d @ 7ffa7e645da0
0000022fc5e4b6a8 00007ffa7f7567a8 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c__DisplayClass16_0+<<UseParseDirective>b__0>d @ 7ffa7e645940
0000022fc5e4b7c8 00007ffa7f757440 (1) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c+<<RegisterWithDotnetSuggest>b__5_0>d @ 7ffa7e644de0
0000022fc5e4b8e0 00007ffa7f758448 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c__DisplayClass8_0+<<UseExceptionHandler>b__0>d @ 7ffa7e6445d0
0000022fc5e4ba00 00007ffa7f7594f0 (0) System.CommandLine.Builder.CommandLineBuilderExtensions+<>c+<<CancelOnProcessTermination>b__1_0>d @ 7ffa7e643f40
0000022fc5e4bb38 00007ffa7f759db0 (0) System.CommandLine.Invocation.InvocationPipeline+<<InvokeAsync>g__FullInvocationChainAsync|2_0>d @ 7ffa7e643580
0000022fc5e4bc48 00007ffa7f75a5c8 (0) System.CommandLine.Parsing.ParseResultExtensions+<InvokeAsync>d__0 @ 7ffa7e642ef0
0000022fc5e4bd60 00007ffa7f75cba8 (0) System.CommandLine.Parsing.ParserExtensions+<InvokeAsync>d__3 @ 7ffa7e631a70
0000022fc5e55550 00007ffa7f75e800 (0) Aida.Cli.Program+<Main>d__2 @ 7ffa7e62ad40
0000022fc5e55648 00007ffa7e8cb880 () System.Threading.Tasks.Task+SetOnInvokeMres
So I think I know why the deadlock is there, however, I want to learn how to deduce it from the memory dump, but with !sos.syncblk
being useless and !sosex.dlk
being unavailable I have no idea how to proceed.
EDIT 1
I have implemented a small reproduction here - https://github.com/MarkKharitonov/InterProcessDeadlock. The problematic code is on the master
branch, the fixed version is on the fix
branch. Of course, the aforementioned stack traces do not apply to this small reproduction. The repo has a README file which is self explanatory.
Assuming one can obtain the memory dumps of both the parent and child processes, I am really curious how this deadlock can be deduced from them.
Once again, please, note that these are .NET Core processes and therefore sosex is unavailable.
I will use these facts/assumptions for my diagnosis:
!syncblk
and !dlk
commands to diagnose.This approach will never work. Let's see why.
There are 4 preconditions for a deadlock to happen:
Tools or commands like !dlk
will assume points 1 to 3 as given, and then help you with point 4, figuring out where the dependency is.
Those commands all rely on the assumption that you can find the deadlock within the crash dump (or live process) that you have. That is, they will look for resources within the process and correlate them to the threads within the process to find the circular dependency.
This is not what you have. Your deadlock is between two processes, or (more precise) between threads that belong to two processes. In order to diagnose such a situation, you would need multiple crash dumps, one of each process that is involved.
However, even if you have that, none of the WinDbg commands I know would operate across multiple processes, so this is something that you would still need to do manually. Usually that means going into the rabbithole of analyzing every WaitForSingleObject()
and WaitForMultipleObjects()
call, finding the object that is waited for and building a dependency graph.
With only one crash dump, you will only find a thread waiting, but nothing which waits for this thread, because that information is in a different crash dump (or process), so the dependency graph will never result in a circle and thus, no deadlock is detected.
Note: this situation may be indistinguishable from a situation where there once was a thread in your application holding a resource, but the thread holding the resource has terminated (e.g. due to an exception) and is gone. Because the thread is gone, it will not participate in building the dependency graph and thus, no deadlock will be found either. IMHO, this situation is seldom (just happened once in my life).
(You may need ~2 hours for reproducing this)
I fetched the code from the Github repo and reproduced it. In order to create crash dumps, I increased the timeout of the cancellation token to 500000. I opened 2 WinDbg instances, and attached to the two processes by executable name.
In order to find out which process is which, we look at the command line, which is stored in the PEB (process environment block), like this:
.shell -ci "!peb" findstr "CommandLine"
We then take .NET compatible crash dumps based on the output of the command line
.dump /ma parent.dmp
.dump /ma child.dmp
I then close the live processes, restart WinDbg and start new sessions on the crash dump files. Since you work with 2 WinDbg windows, I like to rename the titles, so I can easily identify them in the task bar.
.wtitle parent
.wtitle child
Just a quick cross check: each of the dumps should now be stopped at a breakpoint injected by the debugger.
0:008> .exr -1
[...]
ExceptionCode: 80000003 (Break instruction exception)
For the analysis, you can exclude the debugger thread, since it is not involved in the deadlock. You find it with ~
and look for the thread which has the exception (marked by #
or .
if there's no hash).
!analyze -v
will not help, since that does an exception analysis, but it won't reveal anything new, since we already know it's stopped at a breakpoint. But !analyze -hang
won't reveal anything either.
For .NET 5+ projects, there are no deadlock specific SOSex commands, but as mentioned earlier, they won't work across different processes anyway.
Let's start with the parent and determine which threads to look at. We are interested in all threads that are waiting for a resource, so we do
0:000> ||
. 0 Full memory user mini dump: [...]\parent.dmp
0:000> !findstack ntdll!NtWaitForSingleObject
0:000> !findstack ntdll!NtWaitForMultipleObjects
Thread 000, 1 frame(s) match [...]
Thread 001, 1 frame(s) match [...]
Thread 002, 1 frame(s) match [...]
Thread 003, 1 frame(s) match [...]
Thread 004, 1 frame(s) match [...]
Thread 006, 1 frame(s) match [...]
So there will be 6 threads to be analyzed. We start at thread 0:
0:000> ~0s
ntdll!NtWaitForMultipleObjects+0x14:
[...]
0:000> .frame 1
01 000000df`ce17e310 00007ffa`255aa274 KERNELBASE!WaitForMultipleObjectsEx+0xf0
0:000> r rcx; r rdx; *** first two parameters
rcx=0000000000000001
rdx=0000028cf2b3fa68
Since we have x64 calling conventions, RCX carries nCount
and RDX has *lpHandles
(see WaitForMultipleObjectsEx). Dumping the memory contents will give us the handles being waited for. We use nCount
as argument for dp
to print exactly the right amount of handles.
0:000> dp rdx L@rcx
0000028c`f2b3fa68 00000000`0000021c
And !handle
gives us the event type:
0:000> !handle 21c f
Handle 000000000000021c
Type Event
Attributes 0
GrantedAccess 0x1f0003:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState,ModifyState
HandleCount 2
PointerCount 65537
Name <none>
Object specific information
Event Type Manual Reset
Event is Waiting
Unfortunately, an Event does not have an owning thread. Events can be set/reset from any thread. That makes analysis hard.
Let's make commands a bit more concise for a quicker analysis.
0:000> ~1s; .frame 1; r rcx; r rdx; dp rdx L@rcx;
ntdll!NtWaitForMultipleObjects+0x14:
00007ffb`8300e044 c3 ret
01 000000df`ce2ff0f0 00007ffb`80a93e2e KERNELBASE!WaitForMultipleObjectsEx+0xf0
rcx=0000000000000001
rdx=000000dfce2ff460
000000df`ce2ff460 00000000`000001c8
0:001> !handle 1c8 f
[...]
Object specific information
Event Type Manual Reset
Event is Waiting
and you do that for all threads.
Next, check what the remaining threads are doing. One of them might set an event and resolve the deadlock. I usually use the k
command, but in the following, I use more specific commands to get only the relevant output.
0:006> ~5k L2
# Child-SP RetAddr Call Site
00 000000df`cea7f238 00007ffb`80a62770 ntdll!NtReadFile+0x14
01 000000df`cea7f240 00007ffa`24ac9fcf KERNELBASE!ReadFile+0x100
0:005> .frame 1
01 000000df`cea7f240 00007ffa`24ac9fcf KERNELBASE!ReadFile+0x100
0:005> r rcx
rcx=00000000000002ec
0:005> !handle 2ec f
Handle 00000000000002ec
Type File
Attributes 0
GrantedAccess 0x120189:
ReadControl,Synch
Read/List,ReadEA,ReadAttr,WriteAttr
HandleCount 2
PointerCount 65537
No object specific information available
Thread 5 is trying to read a file (Handle 2EC in RCX, see ReadFile) and probably waiting for IO.
0:007> ~7e .frame 8
08 000000df`cebff3c0 00007ffa`255c51ad coreclr!TieredCompilationManager::HandleCallCountingForFirstCall+0x16c [...]
Thread 7 seems to JIT compile code. That thread might resolve the deadlock, if the new code sets an event which wakes up a thread.
0:007> ~8k L1
# Child-SP RetAddr Call Site
00 000000df`ce77fdb8 00007ffb`8303ca7e ntdll!DbgBreakPoint
Thread 8 is the debug breakpoint.
0:007> ~9k
# Child-SP RetAddr Call Site
00 000000df`ced7fa38 00000000`00000000 ntdll!RtlUserThreadStart
Thread 9 is just starting and has no call stack yet (why? I have no idea).
Let's enhance our understanding by using ~
(the thread names are helpful) and !clrstack
where available.
0:000> ~
. 0 Id: 6b78.69d8 Suspend: 1 Teb: 000000df`cde43000 Unfrozen
1 Id: 6b78.1320 Suspend: 1 Teb: 000000df`cde45000 Unfrozen ".NET EventPipe"
2 Id: 6b78.6a68 Suspend: 1 Teb: 000000df`cde47000 Unfrozen ".NET Debugger"
3 Id: 6b78.36a4 Suspend: 1 Teb: 000000df`cde49000 Unfrozen ".NET Finalizer"
4 Id: 6b78.2588 Suspend: 1 Teb: 000000df`cde4f000 Unfrozen ".NET Timer"
5 Id: 6b78.a74 Suspend: 1 Teb: 000000df`cde51000 Unfrozen ".NET TP Worker"
6 Id: 6b78.25f0 Suspend: 1 Teb: 000000df`cde53000 Unfrozen ".NET TP Gate"
7 Id: 6b78.1638 Suspend: 1 Teb: 000000df`cde55000 Unfrozen ".NET TP Worker"
# 8 Id: 6b78.60e0 Suspend: 1 Teb: 000000df`cde57000 Unfrozen
9 Id: 6b78.3f08 Suspend: 0 Teb: 000000df`cde59000 Unfrozen
0:000> ~*e !clrstack
[... all output shortened ...]
OS Thread Id: 0x69d8 (0)
000000DFCE17EBE0 00007ffa24d4c570 System.Runtime.CompilerServices.TaskAwaiter`1[[System.Int32, System.Private.CoreLib]].GetResult()
000000DFCE17EC10 00007ff9c5b3198b Program.(System.String[])
OS Thread Id: 0x2588 (4)
000000DFCE8FFB30 00007ffa24bd0fb4 System.Threading.WaitHandle.WaitOneNoCheck(Int32)
000000DFCE8FFB90 00007ffa24be230f System.Threading.TimerQueue.TimerThread()
OS Thread Id: 0xa74 (5)
000000DFCEA7F2C0 00007ffa24ac9fb6 Interop+Kernel32.ReadFile(System.Runtime.InteropServices.SafeHandle, Byte*, Int32, Int32 ByRef, System.Threading.NativeOverlapped*)
OS Thread Id: 0x25f0 (6)
000000DFCDCEF718 00007ffb8300e044 System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
000000DFCDCEF820 00007ffa24bd0fb4 System.Threading.WaitHandle.WaitOneNoCheck(Int32)
000000DFCDCEF880 00007ffa24be9486 System.Threading.PortableThreadPool+GateThread.GateThreadStart()
OS Thread Id: 0x1638 (7)
000000DFCEBFF9E8 00007ffb82f9b8a4 System.Threading.PortableThreadPool+WorkerThread.ShouldExitWorker(System.Threading.PortableThreadPool, System.Threading.LowLevelLock)
000000DFCEBFFB20 00007ffa24bec7e9 System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
Summary of the parent:
Thread 0 ("Main thread") is waiting for Manual Reset Event 21C, likely for a task completion. From our source code, we can see that this is waiting for the task of TaskRunner.Run()
.
Thread 1 (".NET Event Pipe") is waiting for Manual Reset Event 1C8. IMHO, the event pipe thread is used for ETW event logging and thus not interesting in normal deadlock scenarios.
Thread 2 (".NET Debugger") is waiting for Auto Reset Event 1FC, Auto Reset Event 204, Auto Reset Event 1F8. IMHO, this thread will send debugging events to an IDE and thus is not interesting in normal deadlock scenarios.
Thread 3 (".NET Finalizer") is waiting for Manual Reset Event 240, Auto Reset Event 248. There may be issues related to the finalization of .NET objects, which prevent the finalizer thread from working. This finalizer thread has no stack frames except coreclr!FinalizerThread
, so there's no own code involved. Therefore I would exclude this thread from analysis.
Thread 4 (".NET Timer") is waiting for Auto Reset Event 2AC.
Thread 5 (".NET TP Worker") is trying to read the file 2EC . This is likely the task of TaskRunner.Run()
which is executed here.
Thread 6 (".NET TP Gate") is waiting for Auto Reset Event 320. IMHO, this thread is responsible for creating new worker threads when required. So this is a .NET internal thread which we can exclude from analysis.
Thread 7 (".NET TP Worker") is compiling some code that might resolve the deadlock, e.g. by writing data to Handle 2EC.
Thread 8 is the debugging thread. You can always exclude than in any analysis.
Thread 9 has no callstack (yet?)
The major dependency graph is: Thread 0 waits for a task, which is run on thread 5 and waiting for a file to be read.
0:003> ~
0 Id: 10d8.3a48 Suspend: 1 Teb: 00000004`21206000 Unfrozen
1 Id: 10d8.588 Suspend: 1 Teb: 00000004`21208000 Unfrozen ".NET EventPipe"
2 Id: 10d8.2200 Suspend: 1 Teb: 00000004`2120a000 Unfrozen ".NET Debugger"
. 3 Id: 10d8.2e1c Suspend: 1 Teb: 00000004`2120c000 Unfrozen ".NET Finalizer"
# 4 Id: 10d8.3734 Suspend: 1 Teb: 00000004`21212000 Unfrozen
5 Id: 10d8.307c Suspend: 0 Teb: 00000004`21214000 Unfrozen
We exclude thread 1, 2 and 3 due to their special nature in .NET (see before). We exclude thread 4, because it's the injected breakpoint. We exclude thread 5, because it also has no callstack (like thread 9 in the parent).
0:005> ~0s
ntdll!NtWriteFile+0x14:
00007ffb`8300d5f4 c3 ret
0:000> .frame 1
01 00000004`2157e0c0 00007ffa`aca37d84 KERNELBASE!WriteFile+0x76
0:000> r rcx
rcx=00000000000002f0
0:000> !handle 2f0 f
Handle 00000000000002f0
Type File
Attributes 0
GrantedAccess 0x120196:
ReadControl,Synch
Write/Add,Append/SubDir/CreatePipe,WriteEA,ReadAttr,WriteAttr
HandleCount 2
PointerCount 65450
No object specific information available
(RCX holds the first parameter of WriteFile)
The parent waits for thread 5 which tries to read on Handle 2EC (stdout
), but can't read anything from there. The child thread 0 is trying to write to a different file on Handle 2F0 (stderr
), but can't write any more because the buffer is full.
This is actually one of the hard cases, where you need to reason about a deadlock which you can't see in the dependency graph, since 2EC and 2F0 look unrelated.
The problem will go away if you consume the data correctly.
So either the child writes to stdout
Console.Out.WriteLine($"{lineContent} #{i + 1}");
or Process Runner reads from stderr, so flip the streams here
while ((line = await process.StandardError.ReadLineAsync(cancellationToken)) != null)
{
notifyStdOutLine?.Invoke(line);
}
stdErr = await process.StandardOutput.ReadToEndAsync(cancellationToken);
However, both of these "solutions" require the child to be a white box. It would definitely be more advisable to use a true asynchronous approach, so that any change to the child can be handled by the parent.
See the Microsoft documentation, especially the Remarks section about deadlocks or @Blindly's answer.