Search code examples
cmultithreadingunixpthreadspthread-join

What is the problem for PTHREAD_CANCELED and thread’s start function return value


I'm reading Kerrisk's book and see that the following as a note,

Caution is required when using a cast integer as the return value of a thread’s start function. The reason for this is that PTHREAD_CANCELED, the value returned when a thread is canceled (see Chapter 32), is usually some implementation-defined integer value cast to void *. If a thread’s start function returns the same integer value, then, to another thread that is doing a pthread_join(), it will wrongly appear that the thread was canceled. In an application that employs thread cancellation and chooses to return cast integer values from a thread’s start functions, we must ensure that a normally terminating thread does not return an integer whose value matches PTHREAD_CANCELED on that Pthreads implementation. A portable application would need to ensure that normally terminating threads don’t return integer values that match PTHREAD_CANCELED on any of the implementations on which the application is to run.

I don't understand importance of the note. Could you codify(show its simple code snippet) it simply to illustrate? What is the issue in th(ese)is case(s)?


Solution

  • This is a typical definition of PTHREAD_CANCELED (quoted verbatim from /usr/include/pthread.h on the machine where I'm typing this, which runs Linux with GNU libc):

    #define PTHREAD_CANCELED ((void *) -1)
    

    So if you have code like this to check for cancellation:

    void *thread_result;
    int rv = pthread_join(child, &thread_result);
    if (rv)
        error_exit("pthread_join failed", rv);
    if (thread_result == PTHREAD_CANCELED)
        error_exit("thread canceled", 0);
    

    you must not also have a thread procedure like this:

    static void *appears_to_be_canceled(void *unused)
    {
        return ((void *) -1);
    }
    

    because PTHREAD_CANCELED and ((void *) -1) are equal. Note that the number is not guaranteed to be −1, it could differ from system to system, and there's no good way to find out what it is at compile time because ((void *)...) isn't usable in an #if expression.

    There are two good ways to avoid this problem:

    • Don't use thread cancellation, so you don't have to check for PTHREAD_CANCELED and don't have to care what its numeric value is. This is a good idea for several other reasons, most importantly that cancellation makes it even harder to write robust multithreaded code than it already is.
    • Return only valid pointers from your thread procedures, not numbers. A good idiom to follow is like this:

      struct worker_data
      {
         // put _everything_ your thread needs to access in here
      };
      static void *worker_proc (void *data_)
      {
         struct worker_data *data = data_;
         // do stuff with `data` here 
         return data_;
      }
      

      Returning the worker_data object means the code that calls pthread_join doesn't have to track which worker_data object corresponds to which pthread_t. And it also means the return value of a successfully completed thread is guaranteed not to be equal to PTHREAD_CANCELED, because PTHREAD_CANCELED is guaranteed not to compare equal to any valid pointer.