As I explain in this question, I am having some issues with my Threadpool implementation.
Since the previous question is a bit too broad, I thought I'd also ask a very specific question.
The way I implemented the Threadpool is that worker threads go through 2 phases. First, wait for main thread to give the go signal, the sleep phase. After that, the worker threads take jobs from a queue and execute them asynchronously, the work phase. When the queue is empty, the workers go back to the sleep phase.
The way I implemented that is via 2 mutexes that only the main thread has access to. All the worker threads have access to is a pointer to one of the mutexes.
Before creating the worker threads, the main thread creates the 2 mutexes and locks the first one. Then it creates the threads and they try to lock the mutex therefore going to sleep. After creating the job queue and before unlocking mutex 1, it locks mutex 2 and sets the pointer of the worker threads to mutex 2. Then it unlocks mutex 1 and the worker threads basically lock the mutex and immediately release it again and go to work. After that they try to lock mutex 2 this time and go to sleep. Now, this repeats as often as desired and the main thread basically ping-pongs between the 2 mutexes.
One thing this scheme relies on is that worker threads store a local copy of the mutex pointer so that they can lock and unlock on the same mutex while the main thread can safely change the shared pointer.
Here is the related code:
// function for worker threads
// the 2 nested loops correspond to a waiting phase and a work phase
static void* worker_thread(void* arguments){
WorkerThread* worker_thread = (WorkerThread*) arguments;
SynchronizationHandle* sync_handle = worker_thread->sync_handle;
// the outer loop deals with waiting on the main thread to start work or terminate
while(true){
// adress is saved to local variable so that main thread can change adress to other mutex without race condition
pthread_mutex_t* const barrier_mutex = (pthread_mutex_t* const)sync_handle->barrier_mutex;
// try to lock mutex and go to sleep and wait for main thread to unlock it
pthread_mutex_lock(barrier_mutex);
// unlock it for other threads
pthread_mutex_unlock(barrier_mutex);
// the inner loop executes jobs from the work queue and checks inbetween if it should terminate
while(true){
if(sync_handle->terminate)
pthread_exit(0);
if(!WorkerThread_do_job(sync_handle, worker_thread->ID))
break;
}
}
}
typedef struct{
pthread_t* thread_handles;
WorkerThread* worker_threads;
uint worker_threads_count;
SynchronizationHandle* sync_handle;
pthread_mutex_t barrier_mutexes[2];
uint barrier_mutex_cursor;
} ThreadPool;
static void ThreadPool_wakeup_workers(ThreadPool* pool){
// ASSUMPTION: thread executing already owns pool->barrier_mutexes + pool->barrier_mutex_cursor
// compute which mutex to switch to next
uint offset = (pool->barrier_mutex_cursor + 1)%2;
// lock next mutex before wake up so that worker threads can't get hold of it before main thread
pthread_mutex_lock(pool->barrier_mutexes + offset);
// change adress to the next mutex before unlocking previous mutex, otherwise race condition
pool->sync_handle->barrier_mutex = pool->barrier_mutexes + offset;
// unlocking the previous mutex "wakes up" the worker threads since they are trying to lock it
// hence why the assumption needs to hold
pthread_mutex_unlock(pool->barrier_mutexes + pool->barrier_mutex_cursor);
pool->barrier_mutex_cursor = offset;
}
I would expect that storing the pointer in a local variable guarantees that the optimizing compiler locks on the mutex from the local variable and not for whatever reason uses the pointer from the handle. Is this expectation correct?
As I explain in the question I linked to, the expected behaviour happens in 999 times out of 1000 procedure calls. But every once in a while I get threads that don't seem to execute their jobs while still emptying the queue and signaling the main thread that their work is done. And the only non-determinism in the whole process is the OS scheduler, so I have no idea why this error happens so rarely.
Any help is much appreciated, thank you.
I would expect that storing the pointer in a local variable guarantees that the optimizing compiler locks on the mutex from the local variable and not for whatever reason uses the pointer from the handle. Is this expectation correct?
The expectation is correct, provided that the program has no undefined behavior. Among the most significant potential sources of undefined behavior in a multithreaded program are data races, which occur when two different threads both access the same object (the pointer object designated by sync_handle->barrier_mutex
, say), at least one of the accesses is a write, and there is a possible execution of the program in which no synchronization action creates a "happens-before" relationship between the accesses. The code presented appears prone to exactly such a data race.
One cannot reason from the C language specification about what a program actually will do under such circumstances, as that's exactly what "undefined behavior" means. However, in practice, a more likely manifestation than the program reading the value of a different object than you intended would be that the racy read results in a different value than you expected.