I'm working on a project where I need to make a running program execute a function on demand. For this I am using ptrace. I know that this is possible because GDB does it.
Right now I am using an adapted version of the the code found on: https://github.com/eklitzke/ptrace-call-userspace This program displays how to call fprintf in a target program.
The program I am facing appears when the called function uses nanosleep(). If nanosleep() is called while inside the function called by the tracer, the tracee crashes with a SIGSEGV, but only after the sleep is concluded. If the function is called normally by the tracee itself, everything works properly.
I concluded that the problem is related to how the function is called, probably something to do with the tracee's stack or it's register values. I already checked that the stack is 16 byte aligned when entering the function for example.
Code of the tracer is present in the github above (difference is the called function and I also removed the arguments)
Code for the tracee is simple a dummy process that prints it's PID every second.
Code for the function that is called:
#include <stdio.h>
#include <time.h>
void hello()
{
struct timespec tim1;
tim1.tv_sec = 1;
tim1.tv_nsec = 0;
struct timespec tim2;
nanosleep(&tim1, &tim2);
puts("Hello World!!!");
}
When the traced program crashes the backtrace is as follows:
#0 0xfffffffffffffff7 in ?? ()
#1 0x00007effb0e6e6e0 in hello () at hello.c:10
#2 0x00007effb195c005 in ?? ()
#3 0x00007effb1435cc4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#4 0x00000000004005de in main ()
The register values of the dumped core:
rax 0xfffffffffffffff7 -9
rbx 0x7ffc858a0e40 140722548903488
rcx 0x7effb1435e12 139636655742482
rdx 0x7ffc858a0df8 140722548903416
rsi 0x7ffc858a0df8 140722548903416
rdi 0x7ffc858a0e08 140722548903432
rbp 0x7ffc858a0e18 0x7ffc858a0e18
rsp 0x7ffc858a0df0 0x7ffc858a0df0
r8 0xffffffffffffffff -1
r9 0x0 0
r10 0x7ffc858a0860 140722548901984
r11 0x246 582
r12 0x7ffc858a0ec0 140722548903616
r13 0x7ffc858a1100 140722548904192
r14 0x0 0
r15 0x0 0
rip 0xfffffffffffffff7 0xfffffffffffffff7
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Output of the tracer:
./call_hello -p 17611
their %rip 0x7effb1435e10
allocated memory at 0x7effb195c000
executing jump to mmap region
successfully jumped to mmap area
their lib 0x7effb0e6e000
their func 0x7effb0e6e000
Adding rel32 to new_text[0]Adding func_delta to new_text[1-4]Adding TRAP to new_text[5]inserting code/data into the mmap area at 0x7effb195c000
setting the registers of the remote process
continuing execution
PTRACE_CONT unexpectedly got status Unknown signal 2943
If I remove the call to nanosleep everything works as expected - "Hello World!!!" is printed. As I said previously, the segmentation fault only occurs after the requested sleep of 1 second. I don't know how nanosleep is causing the instruction pointer to hold 0xfffffffffffffff7
.
Any suggestions or ideas on what I should look into in order to solve this issue? Thanks in advance!
I am testing this on CentOS Linux release 7.6.1810.
The issue is as follows:
Your call-hello program writes the two instructions
syscall
call %rax
to the memory where the current value of the %rip register (instruction pointer) points to. Since your target program has an (implicit) call to nanosleep()
in its main loop, the %rip points almost always to the return address of the syscall (somewhere in the libc). At this point, the syscall executes mmap()
and then jumps to the return value (the freshly mmapped space).
But later, in your hello()
function, you again call nanosleep()
. At the return address, there still is the injected code above! Some random syscall is executed (depending on the content of %rax), which fails with error code -9 (EBADFD), which is 0xfffffffffffffff7
in %rax now. Then, the call %rax
jumps right there, killing your process.
So, the best solution is to find a place, where you can inject and execute the 4 bytes of code without overwriting other code. Alternatively, you can restore the original code before continuing to execute hello()
and put it in again after execution of hello()
ended (after the trap), as for example like this:
// update the mmap area
printf("inserting code/data into the mmap area at %p\n", mmap_memory);
if (poke_text(pid, mmap_memory, new_text, NULL, sizeof(new_text))) {
goto fail;
}
- if (poke_text(pid, rip, new_word, NULL, sizeof(new_word))) {
+ if (poke_text(pid, rip, old_word, NULL, sizeof(old_word))) {
goto fail;
}
Later, however, you have to reinstall the syscall-code briefly to make the munmap()
call happen, for example here:
if (ptrace(PTRACE_SETREGS, pid, NULL, &newregs)) {
perror("PTRACE_SETREGS");
goto fail;
}
+ if (poke_text(pid, rip, new_word, NULL, sizeof(new_word))) {
+ goto fail;
+ }
new_word[0] = 0xff; // JMP %rax
new_word[1] = 0xe0; // JMP %rax
Now it should work as you expect.