Detached pthread data race from helgrind

I have a much larger piece of multithreaded software (proprietary and cannot share) which is reporting a data race from helgrind (see data race below). I cannot share this software but I have devised some tests to demonstrate the race.

The race from the actual software having issues:

==7746== Possible data race during write of size 1 at 0xAC83697 by thread #4
==7746== Locks held: 2, at addresses 0x583BCD8 0x5846F58
==7746==    at 0x4C3A3CC: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7746==    by 0x401375F: _dl_allocate_tls_init (dl-tls.c:515)
==7746==    by 0x5053CED: get_cached_stack (allocatestack.c:254)
==7746==    by 0x5053CED: allocate_stack (allocatestack.c:501)
==7746==    by 0x5053CED: pthread_create@@GLIBC_2.2.5 (pthread_create.c:539)
==7746==    by 0x4C34BB7: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7746==    by 0x40BFA6: <redacted symbol names from private project>
==7746==    by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7746==    by 0x50536B9: start_thread (pthread_create.c:333)
==7746== 
==7746== This conflicts with a previous write of size 1 by thread #10
==7746== Locks held: none
==7746==    at 0x5053622: start_thread (pthread_create.c:265)
==7746==  Address 0xac83697 is in a rw- anonymous segment
==7746==

This data race comes up when the software shuts down a series of threads and then re-launches some new threads in the same thread pool. Unfortunately I cannot provide any of this code, however, I believe I was able to reproduce several examples that demonstrate the problem.

I've found 3 other questions which relate to this problem:

Why does this recursive pthread_create call result in data race?

The answer to the above was to manually set/allocate the stack, I don't believe this is a viable answer, and if it is, can somebody explain why?

Data race during nested thread creation

The answer didn't have any effect

Data race with detached pthread detected by valgrind

There was no answer to this one.

EDIT: I have added another (less complex) example at the bottom of this post which can also reproduce the problem.

I was able to rewrite the example given in the first question into a minimally reproducible example, well, mostly.

The following piece of code will generate the following data race about 85% of the time it is run on my machine (Ubuntu 16.04.6 LTS)

Run with:

gcc -g ./test.c -o test -lpthread && valgrind --tool=helgrind ./test

==15656== Possible data race during write of size 1 at 0x5C27697 by thread #4
==15656== Locks held: none
==15656==    at 0x4C3A3CC: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==15656==    by 0x401375F: _dl_allocate_tls_init (dl-tls.c:515)
==15656==    by 0x4E47CED: get_cached_stack (allocatestack.c:254)
==15656==    by 0x4E47CED: allocate_stack (allocatestack.c:501)
==15656==    by 0x4E47CED: pthread_create@@GLIBC_2.2.5 (pthread_create.c:539)
==15656==    by 0x4C34BB7: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==15656==    by 0x400832: launch (test3.c:22)
==15656==    by 0x4008FC: threadfn3 (test3.c:48)
==15656==    by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==15656==    by 0x4E476B9: start_thread (pthread_create.c:333)
==15656== 
==15656== This conflicts with a previous write of size 1 by thread #2
==15656== Locks held: none
==15656==    at 0x4E47622: start_thread (pthread_create.c:265)
==15656==  Address 0x5c27697 is in a rw- anonymous segment

EDIT: I have added another (less complex) example at the bottom of this post which can also reproduce the problem.

Here is the program which I have constructed to reproduce the problem, the semaphores are not necessary but they seem to greatly increase the chance of the data race occurring.

#include <semaphore.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>

pthread_t t1;
pthread_t t2;
pthread_t t3;
pthread_t t4;

void *threadfn1(void *p);
void *threadfn2(void *p);
void *threadfn3(void *p);
void *threadfn4(void *p);

sem_t sem;
sem_t sem2;
sem_t sem3;

void launch(pthread_t *t, void *(*fn)(void *), void *arg)
{
    pthread_create(t,NULL,fn,arg);
    pthread_detach(*t);
}

void *threadfn1(void *p)
{
    launch(&t2, threadfn2, NULL);
    printf("1 %p\n", p);
    // notify threadfn3 we are done
    sem_post(&sem);
    return NULL;
}

void *threadfn2(void *p)
{
    launch(&t3, threadfn3, NULL);
    printf("2 %p\n", p);
    // notify threadfn4 we are done
    sem_post(&sem2);
    return NULL;
}

void *threadfn3(void *p)
{
    // wait for threadfn1 to finish
    sem_wait(&sem);
    launch(&t4, threadfn4, NULL);
    // wait for threadfn4 to finish
    sem_wait(&sem3);
    printf("3 %p\n", p);
    return NULL;
}

void *threadfn4(void *p)
{
    // wait for threadfn2 to finish
    sem_wait(&sem2);
    printf("4 %p\n", p);
    // notify threadfn3 we are done
    sem_post(&sem3);
    return NULL;
}

int main()
{
    sem_init(&sem, 0, 0);
    sem_init(&sem2, 0, 0);
    sem_init(&sem3, 0, 0);

    launch(&t1, threadfn1, NULL);
    printf("main\n");
    pthread_exit(NULL);
}

It appears to be related to threads ending before their parents or parents-of-parents have ended... Ultimately I cannot track down exactly what is causing the data race to occur.

Also it should be noted there is another data race which has come up a few times during my testing, ultimately I was not able to reliably reproduce it because it just appeared on occasion for no reason. The data race was the same as the one I listed, except for the conflict which appeared to list more of a stack trace than just "start_thread", it looked exactly like the data race reported in the first question above except for the bottom where it lists __libc_thread_freeres:

==15973== Possible data race during write of size 1 at 0x5C27697 by thread #4
==15973== Locks held: none
==15973==    at 0x4C3A3CC: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==15973==    by 0x401375F: _dl_allocate_tls_init (dl-tls.c:515)
==15973==    by 0x4E47CED: get_cached_stack (allocatestack.c:254)
==15973==    by 0x4E47CED: allocate_stack (allocatestack.c:501)
==15973==    by 0x4E47CED: pthread_create@@GLIBC_2.2.5 (pthread_create.c:539)
==15973==    by 0x4C34BB7: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==15973==    by 0x400832: launch (test3.c:22)
==15973==    by 0x4008FC: threadfn3 (test3.c:48)
==15973==    by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==15973==    by 0x4E476B9: start_thread (pthread_create.c:333)
==15973== 
==15973== This conflicts with a previous read of size 1 by thread #2
==15973== Locks held: none
==15973==    at 0x51C10B1: res_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==15973==    by 0x51C1061: __libc_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==15973==    by 0x4E45199: start_thread (pthread_create.c:329)
==15973==    by 0x515547C: clone (clone.S:111)

No, I cannot join the threads, that would not work for our software exhibiting the problem

UPDATE: I have been doing some testing and have managed to generate another example which causes the problem with much less code. If you simply launch threads and detach them in a loop it causes the data race.

#include <pthread.h>
#include <stdio.h>

// seems we only need 3 threads to cause the problem
#define NUM_THREADS 3

pthread_t t1[NUM_THREADS] = {0};

void launch(pthread_t *t, void *(*fn)(void *), void *arg)
{
    pthread_create(t,NULL,fn,arg);
    pthread_detach(*t);
}

void *threadfn(void *p)
{
    return NULL;
}

int main()
{
    int i = NUM_THREADS;
    while (i-- > 0) {
        launch(t1 + i, threadfn, NULL);
    }
    return 0;
}

UPDATE 2: I have found that if you launch all threads BEFORE detaching any of them it appears to prevent the race condition from manifesting. See the following block of code which does not generate a race condition:

#include <pthread.h>

#define NUM_THREADS 3

pthread_t t1[NUM_THREADS] = {0};

void launch(pthread_t *t, void *(*fn)(void *), void *arg)
{
    pthread_create(t,NULL,fn,arg);
}

void *threadfn(void *p)
{
    return NULL;
}

int main()
{
    int i;
    for (i = 0; i < NUM_THREADS; ++i) {
        launch(t1 + i, threadfn, NULL);
    }
    for (i = 0; i < NUM_THREADS; ++i) {
        pthread_detach(t1[i]);
    }
    pthread_exit(NULL);
}

If you add another pthread_create() call after any of the pthread_detach() calls then the race condition re-appears. This leaves me feeling like it's impossible to use pthread_detach() and subsequently use pthread_create() without causing data races.

Solution

In the end I just restructured everything so I could join my threads, I really don't see how detached threads could work without causing this data race.