I've created a tool that finds and resumes suspended threads of a specific process (the reason behind that tool is irrelevant). To test the tool, I've created the following unit test:
// Arrange
var timeout = TimeSpan.FromSeconds(10);
var locker = new object();
var suspendEvent = new ManualResetEventSlim(false);
ulong threadId = 0;
var thread = new Thread(() =>
{
lock (locker)
{
threadId = InteropAPI.TRGetCurrentThreadId();
}
Log($"Thread started: {threadId}");
suspendEvent.Wait();
Log($"Thread finished: {threadId}");
});
thread.Start();
for (var i = 0; i < 100; i++)
{
lock (locker)
{
if (threadId != 0)
{
break;
}
}
Thread.Sleep(10);
}
Assert.That(threadId, Is.Not.EqualTo(0));
// Act
Log("Suspending thread...");
InteropAPI.TRSuspendThread(threadId);
Log("Thread suspended");
suspendEvent.Set();
Log("Event set");
// Assert part omitted
The TRGetCurrentThreadId
simply calls winapi GetCurrentThreadId while TRSuspendThread
calls SuspendThread.
This version of the test works just fine, executed thousands of times. Here's a slightly cleaner version which tries to get rid of the threadId
wait hack:
// Arrange
var timeout = TimeSpan.FromSeconds(10);
var locker = new object();
var suspendEvent = new ManualResetEventSlim(false);
var threadSetEvent = new ManualResetEventSlim(false);
ulong threadId = 0;
var thread = new Thread(() =>
{
lock (locker)
{
threadId = InteropAPI.TRGetCurrentThreadId();
}
threadSetEvent.Set();
Log($"Thread started: {threadId}");
suspendEvent.Wait();
Log($"Thread finished: {threadId}");
});
thread.Start();
threadSetEvent.Wait(timeout);
Assert.That(threadId, Is.Not.EqualTo(0));
// Act
Log("Suspending thread...");
InteropAPI.TRSuspendThread(threadId);
Log("Thread suspended");
suspendEvent.Set();
Log("Event set");
The problem with this version is, it gets stuck. Specifically, in the call to suspendEvent.Set()
. The logs printed:
Thread started: 37348
Suspending thread...
Thread suspended
The thread gets suspended as expected, as confirmed via process explorer. Forcefully resuming the thread using process explorer makes the test finish and print the rest of the logs:
Event set
Thread finished: 37348
Initially, I thought that perhaps runtime moves the managed, test-running thread, onto a suspended system thread, however printing the TRGetCurrentThreadId
in the main thread doesn't confirm that.
The native call stack for the test-running thread, at the time of deadlock, seems to be:
ntdll.dll!NtWaitForAlertByThreadId+0x14
ntdll.dll!TpWorkOnBehalfClearTicket+0x2c3
ntdll.dll!RtlEnterCriticalSection+0x254
ntdll.dll!RtlEnterCriticalSection+0x42
Stack of the suspended thread:
ntdll.dll!ZwCreateEvent+0x14
KERNELBASE.dll!CreateEventW+0x95
suspendEvent.Set()
? Does the call to Set()
require the waiting thread to not be in a suspended state or otherwise it can deadlock?Remarks:
I think that threadSetEvent.Set()
introduces a race between the calls to suspendEvent.Wait()
and suspendEvent.Set()
.
When your test thread calls suspendEvent.Set();
, it tries to acquire the m_lock
of the ManualResetEventSlim
(source code).
private void Set(bool duringCancellation) {
IsSet = true;
if (Waiters > 0) {
lock (m_lock) {
Monitor.PulseAll(m_lock);
}
}
However, the suspended thread might have already acquired this lock with the call to Wait()
before it was suspended:
lock (m_lock) {
while (!IsSet) {
Waiters++;
if (IsSet) // This check must occur after updating Waiters.
{
Waiters--; // revert the increment.
return true;
}
try {
// ** the actual wait **
if (!Monitor.Wait(m_lock, realMillisecondsTimeout))
return false; // return immediately if the timeout has expired.
} finally {
// Clean up: we're done waiting.
Waiters--;
}
}
}
The call to Monitor.Wait()
will release the lock, so maybe the thread is suspended before the call to it - causing the dead-lock. When you resume the thread it releases it, so the event can be set by the test thread.
I think you can't really remove the need for "a hack" in that you might need to non-blockingly wait, but maybe make things a bit cleaner with the use of a Barrier
because it's SignalAndWait
doesn't take a lock at least on Signal
(Set) part.:
// Arrange
var timeout = TimeSpan.FromSeconds(10);
var threadSetEvent = new ManualResetEventSlim(false);
var barrier = new Barrier(2);
uint threadId = 0;
var thread = new Thread(() => {
threadId = InteropAPI.GetCurrentThreadId();
threadSetEvent.Set();
Console.WriteLine($"Thread started: {threadId}");
barrier.SignalAndWait();
Console.WriteLine($"Thread finished: {threadId}");
});
thread.Start();
threadSetEvent.Wait(timeout);
// Act
Console.WriteLine("Suspending thread...");
var spinWait = new SpinWait();
// make sure SignalAndWait() is called from the other thread
// AND in a Waiting state
// before we Suspend it
while (barrier.ParticipantsRemaining != 1) {
// maybe also add || thread.ThreadState == ThreadState.WaitSleepJoin
spinWait.SpinOnce();
}
InteropAPI.SuspendThreadById(threadId);
Console.WriteLine("Thread suspended");
// this unblocks the other thread
// which when resumed will be able to proceed
barrier.SignalAndWait();
Console.WriteLine("Event set");
InteropAPI.ResumeThreadById(threadId);