Search code examples
multithreadingarchitectureparallel-processingpthreadsmulticore

Dead lock without a explicit lock


I am testing a pthread program.

This program is simple. The main thread creates a child thread.

The main thread and the child thread are both operating on a queue.

The child thread keeps scanning the queue and return the minimal element and its position with a infinite loop.

The main thread also is running a loop, each iteration of which delete the minimal element calculated by the child thread from the queue, and insert some new elements to the end of the queue.

The minimal element and its position, and the queue are all declared as global variables.

The main ends when the queue is empty and it will cancel the child thread.

This progress is some like a breadth-first search.

The queue is implemented as an array with a size counter. The deletion operation is implemented as replacing the element to be deleted by the last element and decreasing the size counter by one.

No lock is used here. But when running, the program will get stuck.

What's more amazing, if I insert some printf statements to view the status, it may finish.

I want to know what causes this program endless?

struct multiblocks_pthread_args {
    volatile int local_e;
    volatile int local_v;
    volatile int local_pos;
    int* Q;
    int* val;
    volatile int* size;
} para;

volatile int update = 0;
void* child_thread ( void* args ) {
    pthread_setcanceltype ( PTHREAD_CANCEL_ASYNCHRONOUS, NULL );
    multiblocks_pthread_args* arglist = ( multiblocks_pthread_args* ) args;
    bindToCore ( 1 );
    int* list = arglist -> Q, * value = arglist -> val;
    while ( true ) {
        int size, e, v, pos;
        do {
            size = * ( arglist->size ), e, v = INF, pos = 0;
            update = 0;
            for ( int i = 0; i < size; i++ ) {
                int vi = value[i];
                if ( vi < v ) {
                    pos = i;
                    v = vi;
                }
            }
        } while ( update );
        if ( size > 0 ) e = list[pos];
        arglist->local_e = e;
        arglist->local_pos = pos;
        arglist->local_v = v;
    }
    return NULL;
}

void main_thread () {
    int size;
    int* Q = ( int* ) malloc ( sizeof ( int ) * NumNode );
    int** hash = ( int** ) malloc ( sizeof ( int* ) * numNode );
    NodeColor* color = ( NodeColor* ) malloc ( sizeof ( NodeColor ) * numNode ); 
    // NodeColor is a enum with 3 values: WHITE, GRAY, BLACK

    memset ( color, 0, sizeof ( NodeColor ) * numNode );

    pthread_t tid;
    para.val = ( int* ) malloc ( sizeof ( int ) * NumNode );
    para.Q = Q;
    para.size = &size;
    pthread_create ( &tid, NULL, child_thread, &para );

    // Only one element is in the queue
    size = 0;
    para.Q[size] = 0;
    para.val[size] = 0;
    hash[0] = &para.val[size]; // hash is used to modify the value of particular element
    ++size;
    color[0] = GRAY;
    while ( true ) {
        int global_e, global_v = INF, global_pos;
        global_e = para.local_e, global_v = para.local_v, global_pos = para.local_pos;
        if ( size == 0 ) break;
        if ( color[global_e] != BLACK ) {
            value[global_e] = global_v, color[global_e] = BLACK;
            if ( size > 0 ) {
                --size;
                para.Q[global_pos] = para.Q[size];
                para.val[global_pos] = para.val[size];
                hash[para.Q[global_pos]] = & para.val[global_pos];
                update = 1;
            }
            for ( int i = 0; i < MAXDEG; ++i ) {
                int ee = ;// new element;
                int vv = ;// value of new element;
                if ( /* if new element is valid */ ) {
                    if ( color[ee] == WHITE ) { // WHITE means ee is not in the queue
                        para.Q[size] = ee;
                        para.val[size] = vv;
                        hash[ee] = &para.val[size];
                        ++size, color[ee] = GRAY;
                    } else {
                        *hash[ee] = vv;
                    }
                    update = 1;
                }
            }
        }
    }
    free ( Q );
    pthread_cancel ( tid );
    printf ( "Computation finishes!!!" );
    return ;
}

Solution

  • That's not a deadlock but a race condition.

    The overall structure of your hang is, you start with WHITE item at index 0 and this loop goes on forever:

    size = 1;
    while (size != 0) {
      if (WHITE) --size;
      for (...) {
        if (WHITE) ++size;
      }
    }
    

    The only way this changes is that your child thread would set the pos something else than 0. But your child thread depends on size to be greater than 1 to make it other than 0. There you have your race condition.

    My diagnosis may not be accurate. A cleaner code would help a lot. The names like Q, e, v would save you couple of keystrokes but can easily lose you days, as in this example. You also interchangeably use numbers and enums, a bad practice.