Why does this code lead to deadlock?

I am surprised to see from pstack that this code leads to deadlock! I don't see a reason for the same.

pthread_mutex_t lock;

_Cilk_for (int i = 0; i < N; ++i) {
  int ai = A[i];
  if (ai < pivot) {
    pthread_mutex_lock(&lock);
    A[ia++] = ai;
    pthread_mutex_unlock(&lock);
  }
  else if (ai > pivot) {
    pthread_mutex_lock(&lock);
    A[ib++] = ai;
    pthread_mutex_unlock(&lock);
  }
  else {
    pthread_mutex_lock(&lock);
    A[ic++] = ai;
    pthread_mutex_unlock(&lock);
  }
}

I am just using mutexes to make sure that access to A is atomic and serialized.

What is wrong with this code to lead to deadlock?
Is there a better way to implement this?

Solution

If that's code inside a function, then you're not initialising the mutex correctly. You need to set it to PTHREAD_MUTEX_INITIALIZER (for a simple, default mutex) or do a pthread_mutex_init() on it (for more complex requirements). Without proper initialisation, you don't know what state the mutex starts in - it may well be in a locked state simply because whatever happened to be on the stack at that position looked like a locked mutex.

That's why it always needs to be initialised somehow, so that there is no doubt of the initial state.

Another potential problem you may have is this:

int ai = A[i];

You probably should protect that access with the same mutex since otherwise you may read it in a "half-state" (when another thread is only part way through updating the variable).

And, I have to say, I'm not sure that threads are being used wisely here. The use of mutexes is likely to swamp a statement like A[ia++] = ai to the point where the vast majority of time will be spent locking and unlocking the mutex. They're generally more useful where the code being processed during the lock is a little more substantial.

You may find a non-threaded variant will blow this one out of the water (but, of course, don't take my word for it - my primary optimisation mantra is "measure, don't guess").