Search code examples
cmultithreadingthread-localthread-local-storage

Is it possible to deallocate C __thread thread-local memory once the thread exited?


I have a thread-safe function and I want to allocate a dynamic thread-local memory buffer to use it independently and be able to deallocate it once the thread exited. Here's the demo:

void func_needs_storage(void) {
    static __thread void* tlb = NULL;

    if (!tlb)
        tlb = malloc(sizeof(int));

    printf("Thread id: %08lx, local tlb address is: %08lx\n",
            (uintptr_t)pthread_self(), (uintptr_t)tlb);
}

void* thread_func(void *) {
    for (int i = 0; i < 3; ++i)
        func_needs_storage();

    return NULL;
}

int main() {
    pthread_t threads[3];
    for (int i = 0; i < sizeof(threads) / sizeof(*threads); ++i)
        if (pthread_create(&threads[i], NULL, thread_func, NULL))
            return 1;

    for (int i = 0; i < sizeof(threads) / sizeof(*threads); ++i)
        if (pthread_join(threads[i], NULL))
            return 2;

    return 0;
}

Note that I can't allocate/free memory in the thread_func The output is:

Thread id: 7efda7cdd700, local tlb address is: 7efda0000b20 <-- 1st thread
Thread id: 7efda7cdd700, local tlb address is: 7efda0000b20
Thread id: 7efda84de700, local tlb address is: 7efda0000f50 <-- 2nd thread
Thread id: 7efda84de700, local tlb address is: 7efda0000f50
Thread id: 7efda8cdf700, local tlb address is: 7efda0000f70 <-- 3rd thread
Thread id: 7efda8cdf700, local tlb address is: 7efda0000f70

It works as a charm, but unfortunately, this code creates an inevitable memory leak :(

Here func_needs_storage() is a function that needs a temporary buffer to process some data and it may be called many times. The memory it uses should be dynamic and may be quite large (up to megabytes) and could be hardly placed on stack. I don't wanted to allocate the buffer each time the function is called, so I stored the pointer to it in a thread-local static variable, unique for each thread.

The question is: is it possible to deallocate this thread-local memory buffer in C, when the thread exited and it is **guaranteed **that this memory won't be used anymore? Maybe I should use some pthread API or declare my variable somehow other than __thread? The compiler is the latest gcc/clang the OS is archlinux/freebsd.

As a reference example in C++ I could wrap my buffer in a ThreadLocalStorage class and make the destructor that deallocates its internal memory. Then, if a declare a static thread_local ThreadLocalStorage, it's destructor will be called once this thread exited.


Solution

  • I'm back. I have used the pthread_key_create(), pthread_setspecific() and pthread_getspecific() functions to solve my task and I want to share my solution with you, hope that it'll help someone.

    static pthread_key_t key;
    static pthread_once_t key_once = PTHREAD_ONCE_INIT;
    
    void key_destructor(void* tlb) {
        printf("Thread id: %08lx, deallocate local tlb address: %08lx\n",
            (uintptr_t)pthread_self(), (uintptr_t)tlb);
    
        free(tlb);
    }
    
    void make_key_once(void) {
        pthread_key_create(&key, key_destructor /* or just "free" */);
    }
    
    void func_needs_storage(void) {
        pthread_once(&key_once, make_key_once);
    
        void* tlb = NULL;
        if ((tlb = pthread_getspecific(key)) == NULL) 
        {
            tlb = malloc(sizeof(int));
            pthread_setspecific(key, tlb);
        }
    
        printf("Thread id: %08lx, local tlb address is: %08lx\n",
            (uintptr_t)pthread_self(), (uintptr_t)tlb);
    }
    
    void* thread_func(void *) {
        for (int i = 0; i < 3; ++i)
            func_needs_storage();
    
        return NULL;
    }
    
    int main() {
        pthread_t threads[3];
        for (int i = 0; i < sizeof(threads) / sizeof(*threads); ++i)
            if (pthread_create(&threads[i], NULL, thread_func, NULL))
                return 1;
    
        for (int i = 0; i < sizeof(threads) / sizeof(*threads); ++i)
            if (pthread_join(threads[i], NULL))
                return 2;
    
        return 0;
    }
    

    The output is:

    Thread id: 881b309c0, local tlb address is: 200bb81e0
    Thread id: 881b309c0, local tlb address is: 200bb81e0
    Thread id: 881b309c0, local tlb address is: 200bb81e0
    Thread id: 881b309c0, deallocate local tlb address: 200bb81e0
    Thread id: 881b30e40, local tlb address is: 200bb81e0
    Thread id: 881b30e40, local tlb address is: 200bb81e0
    Thread id: 881b30e40, local tlb address is: 200bb81e0
    Thread id: 881b30e40, deallocate local tlb address: 200bb81e0
    Thread id: 881b312c0, local tlb address is: 200bb81e0
    Thread id: 881b312c0, local tlb address is: 200bb81e0
    Thread id: 881b312c0, local tlb address is: 200bb81e0
    Thread id: 881b312c0, deallocate local tlb address: 200bb81e0
    

    The memory is reused. Let's add some delay to the func_needs_storage() to prove that it works:

    Thread id: 8815e49c0[0], local tlb address is: 20062c1e0{0}
    Thread id: 8815e52c0[1], local tlb address is: 20062c300{1}
    Thread id: 8815e4e40[2], local tlb address is: 20062c2e0{2}
    Thread id: 8815e4e40[2], local tlb address is: 20062c2e0{2}
    Thread id: 8815e49c0[0], local tlb address is: 20062c1e0{0}
    Thread id: 8815e52c0[1], local tlb address is: 20062c300{1}
    Thread id: 8815e49c0[0], local tlb address is: 20062c1e0{0}
    Thread id: 8815e49c0[0], deallocate local tlb address: 20062c1e0{0}
    Thread id: 8815e4e40[2], local tlb address is: 20062c2e0{2}
    Thread id: 8815e4e40[2], deallocate local tlb address: 20062c2e0{2}
    Thread id: 8815e52c0[1], local tlb address is: 20062c300{1}
    Thread id: 8815e52c0[1], deallocate local tlb address: 20062c300{1}
    

    If you're not dead-limited to C (as I was) and allowed to use C++ in your code, use the thread_local storage class specifier as the object's destructor will be called for each thread_local object upon thread termination making possible to release the memory of the FD.

    Pro tip: Have a closer look at your OS's thread management system (how does it store the TLS areas and handle thread termination). We use the FreeBSD 9 fork with very peculiar thread management system.

    Good luck!