Hi I am toying around with threads and condition variables in ruby and I'm getting some very confusing results which don't make sense. I am following the ConditionVariable example from the ruby docs and everything seems to go as planned:
mutex = Mutex.new
resource = ConditionVariable.new
waiting_thread = Thread.new {
mutex.synchronize {
puts "Thread 'a' now needs the resource"
resource.wait(mutex)
puts "'a' can now have the resource"
"a can now have the resource"
}
}
signal_thread = Thread.new {
mutex.synchronize {
puts "Thread 'b' has finished using the resource"
resource.signal
}
}
This when running this code I get very expected output:
=> Thread 'a' now needs the resource
=> Thread 'b' has finished using the resource
=> 'a' can now have the resource
However the MOMENT I change it up a bit to join
or get the value
from the waiting_thread
, it blows up with a Deadlock
fatal error.
waiting_thread.value
signal_thread
Outputs:
= Failure/Error: waiting_thread.value -- No live threads left. Deadlock?
I can vaguely understand what is happening -- both are trying to synchronize on the same mutex when the waiting_thread
is indefinitely locked.
But in that case, why does the initial code work flawlessly giving the put
statements in an expected asynchronous result?
This is somewhat important not only for my understanding but to toy around with concurrent testing. How can I use join
and value
with ConditionVariables
to produce what I'm looking for?
I think the code from the Ruby docs can be a bit misleading, because it does not tell you that sending a signal does not buffer anywhere if the receiver is not waiting for it.
So the situation which will result in a deadlock will happen as follows:
signal_thread
enters critical section and calls resource.signal
. This signal will be lost.
signal_thread
is done, and exits.
waiting_thread
enters critical section and calls resource.wait
. It's now locked waiting for a signal that never comes.
All threads are locked or inactive. No more live threads, therefore no one is able to wake up waiting_thread
-->
deadlock error.
You can get a deadlock error randomly on the sample code, depending on your CPU, OS, and the position of the sun or moon, if you just keep running it, because the order of the execution of signal_thread
and waiting_thread
is not deterministic. The order is random, therefore a deadlock may or may not happen, but it CAN happen depending on the execution order.
Now how do you solve it? Well, you need to guarantee waiting_thread
waits, before signal_thread
signals. We can do this using a Queue
, like so:
mutex = Mutex.new
resource = ConditionVariable.new
sync_queue = Queue.new
waiting_thread = Thread.new {
mutex.synchronize {
puts "Waiting thread sending sync message..."
sync_queue << 1
puts "Thread 'a' now needs the resource"
resource.wait(mutex)
puts "'a' can now have the resource"
"a can now have the resource"
}
}
signal_thread = Thread.new {
puts "Signal thread waiting for sync..."
# signal_thread will sleep here, until there is something in the queue to pop.
# This guarantees the right execution order.
sync_queue.pop
mutex.synchronize {
puts "Thread 'b' has finished using the resource"
resource.signal
}
}
waiting_thread.value
Now the code is deterministic, and waiting_thread
will always wait before signal_thread
signals, and the code will work as expected.
You just have to be aware that the signal
call of a condition variable goes up in smoke if nobody is waiting on the other end. I think this important information is missing from the docs.
In addition to that the resource example is not really a very good example of checking if a resource is available in a critical section, because of this problem. If signal_thread
already used the resource, then waiting_thread
will never know it.
In a real situation there needs to be additional data shared between the threads, so that one thread can check if a resource is in use, and only THEN wait for a signal. If the resource is not already in use, then waiting for the signal is not needed, and in fact should not be done at all.
I.e. the ConditionVariable should not be used for checking resource state, only for signaling. In that case we are using condition variables more appropriately.