c multithreading unix pthreads pthread-join

What is the problem for PTHREAD_CANCELED and thread’s start function return value

I'm reading Kerrisk's book and see that the following as a note,

Caution is required when using a cast integer as the return value of a thread’s start function. The reason for this is that PTHREAD_CANCELED, the value returned when a thread is canceled (see Chapter 32), is usually some implementation-defined integer value cast to void *. If a thread’s start function returns the same integer value, then, to another thread that is doing a pthread_join(), it will wrongly appear that the thread was canceled. In an application that employs thread cancellation and chooses to return cast integer values from a thread’s start functions, we must ensure that a normally terminating thread does not return an integer whose value matches PTHREAD_CANCELED on that Pthreads implementation. A portable application would need to ensure that normally terminating threads don’t return integer values that match PTHREAD_CANCELED on any of the implementations on which the application is to run.

I don't understand importance of the note. Could you codify(show its simple code snippet) it simply to illustrate? What is the issue in th(ese)is case(s)?

Solution

This is a typical definition of PTHREAD_CANCELED (quoted verbatim from /usr/include/pthread.h on the machine where I'm typing this, which runs Linux with GNU libc):

#define PTHREAD_CANCELED ((void *) -1)

So if you have code like this to check for cancellation:

void *thread_result;
int rv = pthread_join(child, &thread_result);
if (rv)
    error_exit("pthread_join failed", rv);
if (thread_result == PTHREAD_CANCELED)
    error_exit("thread canceled", 0);

you must not also have a thread procedure like this:

static void *appears_to_be_canceled(void *unused)
{
    return ((void *) -1);
}

because PTHREAD_CANCELED and ((void *) -1) are equal. Note that the number is not guaranteed to be −1, it could differ from system to system, and there's no good way to find out what it is at compile time because ((void *)...) isn't usable in an #if expression.

There are two good ways to avoid this problem:

Don't use thread cancellation, so you don't have to check for PTHREAD_CANCELED and don't have to care what its numeric value is. This is a good idea for several other reasons, most importantly that cancellation makes it even harder to write robust multithreaded code than it already is.
Return only valid pointers from your thread procedures, not numbers. A good idiom to follow is like this:
```
struct worker_data
{
   // put _everything_ your thread needs to access in here
};
static void *worker_proc (void *data_)
{
   struct worker_data *data = data_;
   // do stuff with `data` here 
   return data_;
}
```
Returning the worker_data object means the code that calls pthread_join doesn't have to track which worker_data object corresponds to which pthread_t. And it also means the return value of a successfully completed thread is guaranteed not to be equal to PTHREAD_CANCELED, because PTHREAD_CANCELED is guaranteed not to compare equal to any valid pointer.