Search code examples
cmultithreadingpthreadsposixcancellation

Thread cancellation before calling join() gives an error


The POSIX Standard reads that

The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.

In the following program a single thread is created. This thread executes the thread_task() routine. After the routine is done, the thread exits but, because its detachstate attribute is PTHREAD_CREATE_JOINABLE (by default), I would expect calling pthread_cancel() on this thread to be safe and not return any error. It's kinda lengthy because of extensive error checking

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int counter=0;

void free_buffer(void* buff)
{
    printf("freeing buffer\n");
    free(buff);
}

void* thread_task(void* arg)
{
    void* buffer = malloc(1000);
    pthread_cleanup_push(free_buffer, buffer);

    for(int i = 0; i < 100000; i++) { // 'counter' is a global variable
        for(counter = 0; counter < 10000; counter++);
        pthread_testcancel();
    }

    pthread_cleanup_pop(1);
    printf("Thread exiting\n");
    return NULL;
}

int main()
{
    pthread_t tid;
    int errnum = pthread_create(&tid, NULL, thread_task, NULL);
    if(errnum != 0) {
        fprintf(stderr, "pthread_create(): %s\n", strerror(errnum));
        exit(EXIT_FAILURE);
    }    

    getchar();

    errnum = pthread_cancel(tid);
    if(errnum != 0) {
        fprintf(stderr, "pthread_cancel(): %s [%d]\n", strerror(errnum), errnum);
        exit(EXIT_FAILURE);
    } 

    void* ret;
    errnum = pthread_join(tid, &ret);
    if(errnum != 0) {
        fprintf(stderr, "pthread_join(): %s [%d]\n", strerror(errnum), errnum);
        exit(EXIT_FAILURE);
    } 

    if(ret == PTHREAD_CANCELED) {
        printf("Thread was canceled\n");
    }

    printf("counter = %d\n", counter);
}

This doesn't happen however. When I run the program the messages I see are:

// wait for the thread routine to finish...
freeing buffer
Thread exiting
// press any key
pthread_cancel(): No such process [3]

This seems to suggest that after the thread exits, its TID is no longer valid. Doesn't this go against the standard? What's going on here?


Solution

  • I don't know about the IEEE standard, but IMO, the man pages "pthreads(7)," and "pthread_cancel(3)" are ambiguous.

    The pthread_cancel man page only gives one possible error code, ESRCH, which supposedly means, "No thread with the ID thread could be found." But notice, it says, "No thread...could be found" It doesn't say, "No such ID exists."

    The pthreads(7) man page guarantees that the ID of a non-detached thread remains valid and unique until that ID is join()ed, but it doesn't say anything about whether the thread itself continues to "exist" (in the sense that pthread_cancel() cares about) just because its ID continues to exist.

    I ran the OP's code on a different platform, and pthread_cancel() did not return an error for me, even long after the thread had returned from the thread_task() function. IMO, there's cases to be made for both OP's build toolchain and mine to be "correct" in the sense of, "compliant with the man pages."


    I would expect calling pthread_cancel() on this thread to be safe and not return any error.

    What does "safe" mean? To me, pthread_cancel() would be "safe" if it was possible to create a guaranteed reliable program that uses it. If you had to assume that either behavior is possible, that complicates things, but I don't think it makes the task impossible. IMO the worst it does is limit what kind of information you can gain from reading the errors if your program bothers to log them.