I'm trying to override pthread_create
and pthread_exit
. The overrides should call the originals.
I can override pthread_create
, and it appears to works as long as I exit my main thread with pthread_exit(0);
. If I don't it segfaults.
If I even attempt to override pthread_exit
, I get segfaults.
My setup is below:
#!/bin/sh
cat > test.c <<EOF
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
void *thr(void *Arg)
{
printf("i=%d\n", (int)(intptr_t)Arg);
return 0;
}
int main()
{
putchar('\n');
pthread_t tids[4];
for(int i=0; i < sizeof tids / sizeof tids[0]; i++){
pthread_create(tids+i, 0, thr, (void*)(intptr_t)i);
}
pthread_exit(0); //SEGFAULTS if this isn't here
return 0;
}
EOF
cat > pthread_override.c <<EOF
#define _GNU_SOURCE
#include <dlfcn.h>
#include <pthread.h>
#include <stdio.h>
#if 1
__attribute__((__visibility__("default")))
int pthread_create(
pthread_t *restrict Thr,
pthread_attr_t const *Attr,
void *(*Fn) (void *),
void *Arg
)
{
int r;
int (*real_pthread_create)(
pthread_t *restrict Thr,
pthread_attr_t const *Attr,
void *(*Fn) (void *),
void *Arg
) = dlsym(RTLD_NEXT, "pthread_create");
printf("CREATE BEGIN: %p\n", (void*)Thr);
r = real_pthread_create(Thr, Attr, Fn, Arg);
printf("CREATE END: %p\n", (void*)Thr);
return r;
}
#endif
#if 0
//SEGFAULTS if this is allowed
__attribute__((__visibility__("default")))
_Noreturn
void pthread_exit(void *Retval)
{
__attribute__((__noreturn__)) void (*real_pthread_exit)( void *Arg);
real_pthread_exit = dlsym(RTLD_NEXT, "pthread_exit");
printf("%p\n", (void*)real_pthread_exit);
puts("EXIT");
real_pthread_exit(Retval);
}
#endif
EOF
: ${CC:=gcc}
$CC -g -fpic pthread_override.c -shared -o pthread.so -ldl
$CC -g test.c $PWD/pthread.so -ldl -lpthread
./a.out
Can anyone explain to me what I'm doing wrong and what the reason for the segfaults is?
The problems completely disappear if I substitute musl-gcc for gcc.
Can anyone explain to me what I'm doing wrong and what the reason for the segfaults is?
It's complicated.
You are probably on Linux/x86_64, and being hit by this bug. See also this original report.
Update:
It turns out symbol versions have nothing to do with the problem (on x86_64
, there are no multiple versions of pthread_create
or pthread_exit
).
The issue is that gcc
is configured to pass --as-needed
to the linker.
When you link with pthread_exit
#ifdef
ed out, the a.out
binary gets pthread_exit
from libpthread.so.0
, which is recorded as a NEEDED
shared library:
readelf -d a.out | grep libpthread
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
When you #ifdef
pthread_exit
in, none of the real libpthread.so.0
symbols are needed anymore (the references are satisfied by pthread.so
):
readelf -d a.out | grep libpthread
# no output!
This then causes the dlsym
to fail (there is no next symbol to return -- pthread.so
defines the only one):
Breakpoint 2, __dlsym (handle=0xffffffffffffffff, name=0x7ffff7bd8881 "pthread_create") at dlsym.c:56
56 dlsym.c: No such file or directory.
(gdb) fin
Run till exit from #0 __dlsym (handle=0xffffffffffffffff, name=0x7ffff7bd8881 "pthread_create") at dlsym.c:56
pthread_create (Thr=0x7fffffffdc80, Attr=0x0, Fn=0x40077d <thr>, Arg=0x0) at pthread_override.c:17
17 int (*real_pthread_create)(
Value returned is $1 = (void *) 0x0
Solution: add -Wl,--no-as-needed
to the main application link line before -lpthread
.
P.S. I am reminded about rule #3 from David Agans' book (which I highly recommend): Quit thinking and look.