I am surprised to see from pstack
that this code leads to deadlock! I don't see a reason for the same.
pthread_mutex_t lock;
_Cilk_for (int i = 0; i < N; ++i) {
int ai = A[i];
if (ai < pivot) {
pthread_mutex_lock(&lock);
A[ia++] = ai;
pthread_mutex_unlock(&lock);
}
else if (ai > pivot) {
pthread_mutex_lock(&lock);
A[ib++] = ai;
pthread_mutex_unlock(&lock);
}
else {
pthread_mutex_lock(&lock);
A[ic++] = ai;
pthread_mutex_unlock(&lock);
}
}
I am just using mutexes to make sure that access to A is atomic and serialized.
If that's code inside a function, then you're not initialising the mutex correctly. You need to set it to PTHREAD_MUTEX_INITIALIZER
(for a simple, default mutex) or do a pthread_mutex_init()
on it (for more complex requirements). Without proper initialisation, you don't know what state the mutex starts in - it may well be in a locked state simply because whatever happened to be on the stack at that position looked like a locked mutex.
That's why it always needs to be initialised somehow, so that there is no doubt of the initial state.
Another potential problem you may have is this:
int ai = A[i];
You probably should protect that access with the same mutex since otherwise you may read it in a "half-state" (when another thread is only part way through updating the variable).
And, I have to say, I'm not sure that threads are being used wisely here. The use of mutexes is likely to swamp a statement like A[ia++] = ai
to the point where the vast majority of time will be spent locking and unlocking the mutex. They're generally more useful where the code being processed during the lock is a little more substantial.
You may find a non-threaded variant will blow this one out of the water (but, of course, don't take my word for it - my primary optimisation mantra is "measure, don't guess").