Problem trying to call user function using ptrace - nanosleep causes crash

I'm working on a project where I need to make a running program execute a function on demand. For this I am using ptrace. I know that this is possible because GDB does it.

Right now I am using an adapted version of the the code found on: https://github.com/eklitzke/ptrace-call-userspace This program displays how to call fprintf in a target program.

The program I am facing appears when the called function uses nanosleep(). If nanosleep() is called while inside the function called by the tracer, the tracee crashes with a SIGSEGV, but only after the sleep is concluded. If the function is called normally by the tracee itself, everything works properly.

I concluded that the problem is related to how the function is called, probably something to do with the tracee's stack or it's register values. I already checked that the stack is 16 byte aligned when entering the function for example.

Code of the tracer is present in the github above (difference is the called function and I also removed the arguments)

Code for the tracee is simple a dummy process that prints it's PID every second.

Code for the function that is called:

#include <stdio.h>
#include <time.h>

void hello()
{
    struct timespec tim1;
    tim1.tv_sec = 1;
    tim1.tv_nsec = 0;
    struct timespec tim2;
    nanosleep(&tim1, &tim2);    
    puts("Hello World!!!");
}

When the traced program crashes the backtrace is as follows:

#0  0xfffffffffffffff7 in ?? ()
#1  0x00007effb0e6e6e0 in hello () at hello.c:10
#2  0x00007effb195c005 in ?? ()
#3  0x00007effb1435cc4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#4  0x00000000004005de in main ()

The register values of the dumped core:

rax            0xfffffffffffffff7       -9
rbx            0x7ffc858a0e40   140722548903488
rcx            0x7effb1435e12   139636655742482
rdx            0x7ffc858a0df8   140722548903416
rsi            0x7ffc858a0df8   140722548903416
rdi            0x7ffc858a0e08   140722548903432
rbp            0x7ffc858a0e18   0x7ffc858a0e18
rsp            0x7ffc858a0df0   0x7ffc858a0df0
r8             0xffffffffffffffff       -1
r9             0x0      0
r10            0x7ffc858a0860   140722548901984
r11            0x246    582
r12            0x7ffc858a0ec0   140722548903616
r13            0x7ffc858a1100   140722548904192
r14            0x0      0
r15            0x0      0
rip            0xfffffffffffffff7       0xfffffffffffffff7
eflags         0x10246  [ PF ZF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

Output of the tracer:

./call_hello -p 17611
their %rip           0x7effb1435e10
allocated memory at  0x7effb195c000
executing jump to mmap region
successfully jumped to mmap area
their lib            0x7effb0e6e000
their func           0x7effb0e6e000
Adding rel32 to new_text[0]Adding func_delta to new_text[1-4]Adding TRAP to new_text[5]inserting code/data into the mmap area at 0x7effb195c000
setting the registers of the remote process
continuing execution
PTRACE_CONT unexpectedly got status Unknown signal 2943

If I remove the call to nanosleep everything works as expected - "Hello World!!!" is printed. As I said previously, the segmentation fault only occurs after the requested sleep of 1 second. I don't know how nanosleep is causing the instruction pointer to hold 0xfffffffffffffff7. Any suggestions or ideas on what I should look into in order to solve this issue? Thanks in advance!

I am testing this on CentOS Linux release 7.6.1810.

Solution

The issue is as follows:

Your call-hello program writes the two instructions

syscall
call %rax

to the memory where the current value of the %rip register (instruction pointer) points to. Since your target program has an (implicit) call to nanosleep() in its main loop, the %rip points almost always to the return address of the syscall (somewhere in the libc). At this point, the syscall executes mmap() and then jumps to the return value (the freshly mmapped space).

But later, in your hello() function, you again call nanosleep(). At the return address, there still is the injected code above! Some random syscall is executed (depending on the content of %rax), which fails with error code -9 (EBADFD), which is 0xfffffffffffffff7 in %rax now. Then, the call %rax jumps right there, killing your process.

So, the best solution is to find a place, where you can inject and execute the 4 bytes of code without overwriting other code. Alternatively, you can restore the original code before continuing to execute hello() and put it in again after execution of hello() ended (after the trap), as for example like this:

// update the mmap area
printf("inserting code/data into the mmap area at %p\n", mmap_memory);
if (poke_text(pid, mmap_memory, new_text, NULL, sizeof(new_text))) {
  goto fail;
}

- if (poke_text(pid, rip, new_word, NULL, sizeof(new_word))) {
+ if (poke_text(pid, rip, old_word, NULL, sizeof(old_word))) {
  goto fail;
}

Later, however, you have to reinstall the syscall-code briefly to make the munmap() call happen, for example here:

if (ptrace(PTRACE_SETREGS, pid, NULL, &newregs)) {
  perror("PTRACE_SETREGS");
  goto fail;
}

+ if (poke_text(pid, rip, new_word, NULL, sizeof(new_word))) {
+   goto fail;
+ }

new_word[0] = 0xff; // JMP %rax
new_word[1] = 0xe0; // JMP %rax

Now it should work as you expect.