Search code examples
cpthreadsglibcdynamic-linkingmusl

Mysterious segfaults when overriding pthread functions on glibc but not on musl


I'm trying to override pthread_create and pthread_exit. The overrides should call the originals.

I can override pthread_create, and it appears to works as long as I exit my main thread with pthread_exit(0);. If I don't it segfaults.

If I even attempt to override pthread_exit, I get segfaults.

My setup is below:

#!/bin/sh

cat > test.c <<EOF
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

void *thr(void *Arg)
{
    printf("i=%d\n", (int)(intptr_t)Arg);
    return 0;
}
int main()
{
    putchar('\n');
    pthread_t tids[4];
    for(int i=0; i < sizeof tids / sizeof tids[0]; i++){
        pthread_create(tids+i, 0, thr, (void*)(intptr_t)i);

    }
    pthread_exit(0); //SEGFAULTS if this isn't here
    return 0;
}
EOF
cat > pthread_override.c <<EOF

#define _GNU_SOURCE
#include <dlfcn.h>
#include <pthread.h>
#include <stdio.h>

#if 1
__attribute__((__visibility__("default")))
int pthread_create(
        pthread_t *restrict Thr, 
        pthread_attr_t const *Attr,
        void *(*Fn) (void *), 
        void *Arg
        )
{
    int r;
    int (*real_pthread_create)(
        pthread_t *restrict Thr, 
        pthread_attr_t const *Attr,
        void *(*Fn) (void *), 
        void *Arg
    ) = dlsym(RTLD_NEXT, "pthread_create");
    printf("CREATE BEGIN: %p\n", (void*)Thr);
    r = real_pthread_create(Thr, Attr, Fn, Arg);
    printf("CREATE END: %p\n", (void*)Thr);
    return r;
}
#endif

#if 0 
//SEGFAULTS if this is allowed
__attribute__((__visibility__("default")))
_Noreturn
void pthread_exit(void *Retval)
{
    __attribute__((__noreturn__)) void (*real_pthread_exit)( void *Arg);
    real_pthread_exit = dlsym(RTLD_NEXT, "pthread_exit");
    printf("%p\n", (void*)real_pthread_exit);
    puts("EXIT");
    real_pthread_exit(Retval);
}
#endif
EOF

: ${CC:=gcc}
$CC -g -fpic pthread_override.c -shared -o pthread.so -ldl
$CC -g test.c $PWD/pthread.so -ldl -lpthread 
./a.out

Can anyone explain to me what I'm doing wrong and what the reason for the segfaults is?

The problems completely disappear if I substitute musl-gcc for gcc.


Solution

  • Can anyone explain to me what I'm doing wrong and what the reason for the segfaults is?

    It's complicated.

    You are probably on Linux/x86_64, and being hit by this bug. See also this original report.

    Update:

    It turns out symbol versions have nothing to do with the problem (on x86_64, there are no multiple versions of pthread_create or pthread_exit).

    The issue is that gcc is configured to pass --as-needed to the linker.

    When you link with pthread_exit #ifdefed out, the a.out binary gets pthread_exit from libpthread.so.0, which is recorded as a NEEDED shared library:

    readelf -d a.out | grep libpthread
    0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
    

    When you #ifdef pthread_exit in, none of the real libpthread.so.0 symbols are needed anymore (the references are satisfied by pthread.so):

    readelf -d a.out | grep libpthread
    # no output!
    

    This then causes the dlsym to fail (there is no next symbol to return -- pthread.so defines the only one):

    Breakpoint 2, __dlsym (handle=0xffffffffffffffff, name=0x7ffff7bd8881 "pthread_create") at dlsym.c:56
    56  dlsym.c: No such file or directory.
    (gdb) fin
    Run till exit from #0  __dlsym (handle=0xffffffffffffffff, name=0x7ffff7bd8881 "pthread_create") at dlsym.c:56
    pthread_create (Thr=0x7fffffffdc80, Attr=0x0, Fn=0x40077d <thr>, Arg=0x0) at pthread_override.c:17
    17      int (*real_pthread_create)(
    Value returned is $1 = (void *) 0x0
    

    Solution: add -Wl,--no-as-needed to the main application link line before -lpthread.

    P.S. I am reminded about rule #3 from David Agans' book (which I highly recommend): Quit thinking and look.