I've been fighting with this problem for quite some time, and I've been unable to find a solution or even an explanation for it. So sorry if the question is long, but bear with me as I just want to make it 100% clear in the hopes that someone more experienced than me will be able to figure it out.
I'm keeping the C syntax highlight on for all snippets because it makes them a little bit clearer even if not really correct.
I have a C program which uses some functions from a dynamic library (libzip
). Here it is boiled down to a minimal reproducible example (it basically does nothing, but it works just fine):
#include <zip.h>
int main(void) {
int err;
zip_t *myzip;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Normally, to compile it, I would simply do:
gcc -c prog.c
gcc -o prog prog.o -lzip
This creates, as expected, an ELF which requires libzip
to run:
$ ldd prog
linux-vdso.so.1 (0x00007ffdafb53000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f81eedc7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f81ef780000)
libzip.so.4 => /usr/lib/x86_64-linux-gnu/libzip.so.4 (0x00007f81ef166000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f81eebad000)
(libz
is just a dependency of libzip
)
What I really want to do though, is to load the library myself using dlopen()
. Pretty simple task, no? Well yes, or at least I thought.
To achieve this, I should just need to call dlopen
and let the loader do its job:
#include <zip.h>
#include <dlfcn.h>
int main(void) {
void *lib;
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Of course, since I want to manually load the library myself, I will not link it this time:
# Create prog.o
gcc -c prog.c
# Do a dry-run just to make sure all symbols are resolved
gcc -o /dev/null prog.o -ldl -lzip
# Now recompile only with libdl
gcc -o prog prog.o -ldl -Wl,--unresolved-symbols=ignore-in-object-files
The flag --unresolved-symbols=ignore-in-object-files
tells ld
to not worry about my prog.o
having unresolved symbols at link time (I want to take care of that myself at runtime).
The above Should Just Work™, and indeed it does seem to... but I have two machines, and being the pedantic nerd I am I just thought "well, better make sure and compile it on both of them".
x86-64, Linux 4.9, Debian 9, gcc
6.3.0, ld
2.28. Here everything works as expected.
I can clearly see that the symbols are there:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
===> 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_close
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen@GLIBC_2.2.5 (3)
===> 6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_open
7: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
8: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
9: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)
10: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 25 _edata
11: 0000000000201048 0 NOTYPE GLOBAL DEFAULT 26 _end
12: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
13: 00000000000006a0 0 FUNC GLOBAL DEFAULT 11 _init
14: 0000000000000924 0 FUNC GLOBAL DEFAULT 15 _fini
The PLT entries are also there as expected and look fine:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
00000000000006c0 <.plt>:
6c0: ff 35 42 09 20 00 push QWORD PTR [rip+0x200942] # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
6c6: ff 25 44 09 20 00 jmp QWORD PTR [rip+0x200944] # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
6cc: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000006d0 <zip_close@plt>:
6d0: ff 25 42 09 20 00 jmp QWORD PTR [rip+0x200942] # 201018 <zip_close>
6d6: 68 00 00 00 00 push 0x0
6db: e9 e0 ff ff ff jmp 6c0 <.plt>
00000000000006e0 <dlopen@plt>:
6e0: ff 25 3a 09 20 00 jmp QWORD PTR [rip+0x20093a] # 201020 <dlopen@GLIBC_2.2.5>
6e6: 68 01 00 00 00 push 0x1
6eb: e9 d0 ff ff ff jmp 6c0 <.plt>
00000000000006f0 <zip_open@plt>:
6f0: ff 25 32 09 20 00 jmp QWORD PTR [rip+0x200932] # 201028 <zip_open>
6f6: 68 02 00 00 00 push 0x2
6fb: e9 c0 ff ff ff jmp 6c0 <.plt>
And the program runs without any problem:
$ ./prog
$ echo $?
0
Even looking inside it with a debugger I can clearly see the symbols getting correctly resolved like any normal dynamic symbol:
0x55555555479b <main+43> lea rax, [rbp - 0x14]
0x55555555479f <main+47> mov rdx, rax
0x5555555547a2 <main+50> mov esi, 9
0x5555555547a7 <main+55> lea rdi, [rip + 0xc0] <0x7ffff7ffd948>
0x5555555547ae <main+62> call zip_open@plt <0x555555554620>
|
v ### PLT entry:
0x555555554620 <zip_open@plt> jmp qword ptr [rip + 0x200a02] <0x555555755028>
|
v
0x555555554626 <zip_open@plt+6> push 2
0x55555555462b <zip_open@plt+11> jmp 0x5555555545f0
|
v ### PLT stub:
0x5555555545f0 push qword ptr [rip + 0x200a12] <0x555555755008>
0x5555555545f6 jmp qword ptr [rip + 0x200a14] <0x7ffff7def0d0>
|
v ### Symbol gets correctly resolved
0x7ffff7def0d0 <_dl_runtime_resolve_fxsave> push rbx
0x7ffff7def0d1 <_dl_runtime_resolve_fxsave+1> mov rbx, rsp
0x7ffff7def0d4 <_dl_runtime_resolve_fxsave+4> and rsp, 0xfffffffffffffff0
0x7ffff7def0d8 <_dl_runtime_resolve_fxsave+8> sub rsp, 0x240
x86-64, Linux 4.15, Ubuntu 18.04, gcc
7.4, ld
2.30. Here, something really strange is going on.
Compilation doesn't yield any warning or error, but I do not see the symbols:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen@GLIBC_2.2.5 (3)
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)
The PLT entries are there, but they are filled with zeroes, and aren't even recognized by objdump
:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
0000000000000560 <.plt>:
560: ff 35 4a 0a 20 00 push QWORD PTR [rip+0x200a4a] # 200fb0 <_GLOBAL_OFFSET_TABLE_+0x8>
566: ff 25 4c 0a 20 00 jmp QWORD PTR [rip+0x200a4c] # 200fb8 <_GLOBAL_OFFSET_TABLE_+0x10>
56c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_close@plt should be here instead...
0000000000000580 <dlopen@plt>:
580: ff 25 42 0a 20 00 jmp QWORD PTR [rip+0x200a42] # 200fc8 <dlopen@GLIBC_2.2.5>
586: 68 00 00 00 00 push 0x0
58b: e9 d0 ff ff ff jmp 560 <.plt>
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_open@plt should be here instead...
When the program is run, dlopen()
works fine and loads libzip
into memory, but then when zip_open()
gets called, it just generates a segmentation fault:
$ ./prog
Segmentation fault (code dumped)
Taking a look with a debugger, the issue is even more obvious (in case it wasn't already obvious enough). The PLT entries filled with zeroes just end up decoding to a bunch of add
instructions dereferencing rax
, which contains an invalid address and makes the program segfault and die:
0x5555555546e5 <main+43> lea rax, [rbp - 0x14]
0x5555555546e9 <main+47> mov rdx, rax
0x5555555546ec <main+50> mov esi, 9
0x5555555546f1 <main+55> lea rdi, [rip + 0xc6]
0x5555555546f8 <main+62> call dlopen@plt+16 <0x555555554590>
|
v ### Broken PLT enrty (all 0x0, will cause a segfault):
0x555555554590 <dlopen@plt+16> add byte ptr [rax], al
0x555555554592 <dlopen@plt+18> add byte ptr [rax], al
0x555555554594 <dlopen@plt+20> add byte ptr [rax], al
0x555555554596 <dlopen@plt+22> add byte ptr [rax], al
0x555555554598 <dlopen@plt+24> add byte ptr [rax], al
0x55555555459a <dlopen@plt+26> add byte ptr [rax], al
0x55555555459c <dlopen@plt+28> add byte ptr [rax], al
0x55555555459e <dlopen@plt+30> add byte ptr [rax], al
### Next PLT entry...
0x5555555545a0 <__cxa_finalize@plt> jmp qword ptr [rip + 0x200a52] <0x7ffff7823520>
|
v
0x7ffff7823520 <__cxa_finalize> push r15
0x7ffff7823522 <__cxa_finalize+2> push r14
For question 3 I want to emphasize that the whole point of this is that I want to load the library myself, without linking it, so please refrain from just commenting that this is bad practice, or whatever else.
The above Should Just Work™, and indeed it does seem to...
No, it should not, and if it appears to, that's more of an accident. In general, using --unresolved-symbols=...
is a really bad idea™, and will almost never do what you want.
The solution is trivial: you just need to look up zip_open
and zip_close
, like so:
int main(void) {
void *lib;
zip_t *p_open(const char *, int, int *);
void *p_close(zip_t*);
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
p_open = (zip_t(*)(const char *, int, int *))dlsym(lib, "zip_open");
if (p_open == NULL)
return 1;
p_close = (void(*)(zip_t*))dlsym(lib, "zip_close");
if (p_close == NULL)
return 1;
myzip = p_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
p_close(myzip);
return 0;
}