Trapping malloc in ptrace

I'm trying to trap when a malloc occurs inside of ptrace.

I've been able to hook when a malloc is called so I should be able to capture that through some custom module; however, that is when using dynamic libraries (the -static flag is not used).

Is there a way that I can do this in a generic fashion?

If we look at the following assembly, I know where I need to capture. I just don't know how:

  .file "new.c"
  .section  .rodata
.LC0:
  .string "Hello World"
  .text
  .globl  main
  .type main, @function
main:
.LFB2:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movl  $4, %edi
  call  malloc ;<= TRAP HERE
  movq  %rax, -8(%rbp)
  movl  $.LC0, %edi
  call  puts
  movq  -8(%rbp), %rax
  movq  %rax, %rdi
  call  free
  leave
  .cfi_def_cfa 7, 8
  ret
  .cfi_endproc
.LFE2:
  .size main, .-main
  .ident  "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
  .section  .note.GNU-stack,"",@progbits

From ptrace(2),

PTRACE_SINGLESTEP

Restart the stopped tracee as for PTRACE_CONT, but arrange for the tracee to be stopped at the next entry to or exit from a system call, or after execution of a single instruction, respectively. (The tracee will also, as usual, be stopped upon receipt of a signal.)`

So I'm fairly certain that I'll need that option. From a tutorial I've read, I can single step; however, none of the output makes any sense. Especially if I have some sort of output statement. Here is a brief output when having an output:

RIP: 7ff6cc4387c2 Instruction executed: 63158b48c35d5e41
RIP: 7ff6cc4387c4 Instruction executed: 2f0663158b48c35d
RIP: 7ff6cc4387c5 Instruction executed: 2f0663158b48c3
RIP: 400c38 Instruction executed: 7500e87d83e84589
RIP: 400c3b Instruction executed: b93c7500e87d83
RIP: 400c3f Instruction executed: ba00000000b93c75
RIP: 400c41 Instruction executed: ba00000000b9
RIP: 400c46 Instruction executed: be00000000ba
RIP: 400c4b Instruction executed: bf00000000be
RIP: 400c50 Instruction executed: b800000000bf
RIP: 400c55 Instruction executed: fe61e800000000b8
RIP: 400c5a Instruction executed: bafffffe61e8
RIP: 400ac0 Instruction executed: a68002015a225ff
RIP: 400ac6 Instruction executed: ff40e90000000a68
RIP: 400acb Instruction executed: 9a25ffffffff40e9
RIP: 400a10 Instruction executed: 25ff002015f235ff
RIP: 400a16 Instruction executed: 1f0f002015f425ff
RIP: 7ff6ccf6c160 Instruction executed: 2404894838ec8348
RIP: 7ff6ccf6c164 Instruction executed: 244c894824048948
RIP: 7ff6ccf6c168 Instruction executed: 54894808244c8948
RIP: 7ff6ccf6c16d Instruction executed: 7489481024548948
....
Hello world
....

Why does the value of the IP change so drastically? Is this because I'm in kernel mode before hand?

Also, it looks like the output of the instruction executed isn't lined up correctly (like it is split across lines), but that could just be me trying to put a pattern where there isn't one.

Anyway, here is the program that I'm running to that output: Warning, nasty C\C++ mixture

#include <iostream>
#include <sys/ptrace.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sys/reg.h>
#include <sys/user.h>

#include <iomanip>

using namespace std;

///for when dealing with different archectures.
#if __WORDSIZE == 64
#define REG(reg) reg.orig_rax
#else
#define REG(reg) reg.orig_eax
#endif

int main()
{
  pid_t child;
  long orig_eax;
  const int long_size = sizeof(long);

  child = fork();

  long ins;
  if(child == 0)
  {
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    execl("./dummy", "dummy", NULL);
  }
  else
  {
    ptrace(PTRACE_ATTACH, child, NULL, NULL);
    ptrace(PTRACE_SYSCALL, child, NULL, NULL);
    int status;
    union u {
      long val;
      char chars[long_size];
    }data;
    struct user_regs_struct regs;
    int start = 0;
    long ins;
    while(1)
    {
      wait(&status);
      if(WIFEXITED(status))
        break;
      ptrace(PTRACE_GETREGS,child, NULL, &regs);
      ins = ptrace(PTRACE_PEEKTEXT, child, regs.rip, NULL);
      cout << "RIP: " << hex << regs.rip << " Instruction executed: " << ins << endl;
      ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
    }
    ptrace(PTRACE_DETACH, child, NULL, NULL);
  }
}

If there is any other information needed, please let me know. I know that I'm a bit verbose, but if this is answered hopefully it'll provide enough information to the next person trying to learn ptrace as well.

Solution

There's no practical way to hook malloc that will work in all statically linked executables. In order to hook it, by whatever means, you need to know its address. The only way you can do this is by looking up malloc in the executable's symbol table, but since its statically linked there's no guarantee that it has one. A dynamic library must have a symbol table so it can be dynamically linked, but as statically linked program is completely linked it doesn't need one.

That said, many statically linked executables will have a symbol tables, as it makes debugging pretty much impossible without one. The additional size they take up isn't much of an issue any more. You can use the nm command to inspect any executables you might want to use your application with to get an idea how this issue might affect you.

Assuming you have an executable with symbols the next problem is how to actually read the symbol in your program. The ELF format isn't that simple, so you probably want to use something like BFD (from binutils) or libelf. You could also just use nm from the command line and supply the address manually to your problem.

Once you have the address of malloc you can then trace calls to it using ptrace by setting a breakpoint at start of the function. Setting the breakpoint is simple. Just use PTRACE_PEEKTEXT to read the first byte of the function, save it somewhere, and use PTRACE_POKETEXT to change the byte to 0xCC, the opcode for the Intel x86 breakpoint instruction (INT 3). Then when malloc is called the traced process will be sent a SIGTRAP signal which you can intercept.

What you need to do then is more complicated. You'll need to peform a series of steps something like the following:

Read the registers and/or stack to find the arguments to malloc and record them.
Use PTRACE_POKETEXT to restore the original first byte of the function.
Read the return address from the top of the stack
Set a breakpoint at the location malloc will return to, saving the old value.
Subtract 1 (the size of the breakpoint instruction) from the program counter (EIP/RIP).
Resume running the traced process. The next SIGTRAP you intercept will be after malloc returns.
Record the value malloc returns by reading return register (EAX/RAX).
Use PTRACE_POKETEXT to remove the breakpoint at the return address
Use PTRACE_POKETEXT to put the breakpoint back at the start of malloc
Subtract 1 from the program counter.
Resume running the traced process.

There's probably something I've haven't though of, but that's the sort of thing you need to do.

If you only wanted to work with code you compiled yourself then there a lot easier options, like using glibc's built in support for memory allocation hooks.