Search code examples
assemblyx86gdbexploitmachine-code

Is there a way to make GDB disassemble all memory in a specific range, without regard for instruction boundaries?


x/16i 0xdeadbeef yields:

   0x80481be <_init+22>:    shlb   $0x3a,-0x18(%ebp,%eax,1)
   0x80481c3 <_init+27>:    jle    0x80481c0 <_init+24>
   0x80481c5 <_init+29>:    .byte 0xf7
   0x80481c6 <_init+30>:    add    $0x8,%esp
   0x80481c9 <_init+33>:    pop    %ebx
   0x80481ca <_init+34>:    ret    

where init+22-27 has some juicy instructions "inside" it, if only I could see what they were.

x/16s 0xdeadbeef yields:

0x80481be <_init+22>:   "\300t\005\350:~\373\367\203\304\b"

which isn't very interesting.

I'm writing a ROP chain generator, so I need to find instructions that can be executed by jumping into the "middle" of other instructions. The (very) slow way is to simply run x/i 0xdeadbeef; x/i 0xdeadbef0, .... Is there a faster way?

I've tried x/i+<offset> 0xdeadbeef: the first instruction that yields is in the middle of x/i 0xdeadbeef, but subsequent instructions aren't "indexed into the middle of", making this way the same as the slow way.


Solution

  • GDB is designed for normal use-cases of debugging code the CPU will execute, so disassembly of the next instruction starts at the end of the previous. If you're looking mostly manually, by eye, you might define a GDB function to disassemble a short sequence starting at every byte offset in a range. You can enter this directly (one line at a time) on GDB's interactive command line, or I think put it in your .gdbinit or a in file you source.

    This defines a function that takes 2 args: start address, and length in bytes of disassembly from each starting byte.

    define ROPsearch
      set $i = 0
      while ($i < 32)
         disas /r $arg0+$i, $arg0+$i + $arg1
         set $i=$i+1
      end
    end
    

    You could also parameterize the length (32 bytes) to search as $arg2. See the GDB manual (https://sourceware.org/gdb/current/onlinedocs/gdb.html/Define.html), it has examples.

    In interactive use, this looks like:

    $ gdb /lib/libc.so.6
    ... (yes, allow it to download debug symbols and source, although I really just want symbol names; Arch Linux separates debug symbols out of binary packages now.)
    (gdb) define ROPsearch
    Redefine command "ROPsearch"? (y or n) y
    Type commands for definition of "ROPsearch".
    End with a line saying just "end".
    >set $i = 0
    >while ($i < 32)
     >disas /r $arg0+$i, $arg0+$i + $arg1
     >set $i=$i+1
     >end
    >end
    
    (gdb) ROPsearch abort+4  16
    Dump of assembler code from 0x55555557a3e5 to 0x55555557a3f5:
       0x000055555557a3e5 <__GI_abort+4>:   55                      push   rbp
       0x000055555557a3e6 <__GI_abort+5>:   53                      push   rbx
       0x000055555557a3e7 <__GI_abort+6>:   48 8d 1d 62 37 1b 00    lea    rbx,[rip+0x1b3762]        # 0x55555572db50 <lock>
       0x000055555557a3ee <__GI_abort+13>:  48 81 ec a8 00 00 00    sub    rsp,0xa8
    End of assembler dump.
    Dump of assembler code from 0x55555557a3e6 to 0x55555557a3f6:
       0x000055555557a3e6 <__GI_abort+5>:   53                      push   rbx
       0x000055555557a3e7 <__GI_abort+6>:   48 8d 1d 62 37 1b 00    lea    rbx,[rip+0x1b3762]        # 0x55555572db50 <lock>
       0x000055555557a3ee <__GI_abort+13>:  48 81 ec a8 00 00 00    sub    rsp,0xa8
       0x000055555557a3f5 <__GI_abort+20>:  64 48 8b 04 25 28 00 00 00      mov    rax,QWORD PTR fs:0x28
    End of assembler dump.
    Dump of assembler code from 0x55555557a3e7 to 0x55555557a3f7:
       0x000055555557a3e7 <__GI_abort+6>:   48 8d 1d 62 37 1b 00    lea    rbx,[rip+0x1b3762]        # 0x55555572db50 <lock>
       0x000055555557a3ee <__GI_abort+13>:  48 81 ec a8 00 00 00    sub    rsp,0xa8
       0x000055555557a3f5 <__GI_abort+20>:  64 48 8b 04 25 28 00 00 00      mov    rax,QWORD PTR fs:0x28
    End of assembler dump.
    Dump of assembler code from 0x55555557a3e8 to 0x55555557a3f8:
       0x000055555557a3e8 <__GI_abort+7>:   8d 1d 62 37 1b 00       lea    ebx,[rip+0x1b3762]        # 0x55555572db50 <lock>
       0x000055555557a3ee <__GI_abort+13>:  48 81 ec a8 00 00 00    sub    rsp,0xa8
       0x000055555557a3f5 <__GI_abort+20>:  64 48 8b 04 25 28 00 00 00      mov    rax,QWORD PTR fs:0x28
    End of assembler dump.
    ...
    
    Dump of assembler code from 0x55555557a3fc to 0x55555557a40c:
       0x000055555557a3fc <__GI_abort+27>:  00 00                   add    BYTE PTR [rax],al
       0x000055555557a3fe <__GI_abort+29>:  48 89 84 24 98 00 00 00 mov    QWORD PTR [rsp+0x98],rax
       0x000055555557a406 <__GI_abort+37>:  31 c0                   xor    eax,eax
       0x000055555557a408 <__GI_abort+39>:  64 48 8b 2c 25 10 00 00 00      mov    rbp,QWORD PTR fs:0x10
    End of assembler dump.
    Dump of assembler code from 0x55555557a3fd to 0x55555557a40d:
       0x000055555557a3fd <__GI_abort+28>:  00 48 89                add    BYTE PTR [rax-0x77],cl
       0x000055555557a400 <__GI_abort+31>:  84 24 98                test   BYTE PTR [rax+rbx*4],ah
       0x000055555557a403 <__GI_abort+34>:  00 00                   add    BYTE PTR [rax],al
       0x000055555557a405 <__GI_abort+36>:  00 31                   add    BYTE PTR [rcx],dh
       0x000055555557a407 <__GI_abort+38>:  c0 64 48 8b 2c          shl    BYTE PTR [rax+rcx*2-0x75],0x2c
       0x000055555557a40c <__GI_abort+43>:  25 10 00 00 00          and    eax,0x10
    ...
    

    abort was just a symbol name that appeared early in objdump -drwC -Mintel /lib/libc.so.6 | less. My .gdbinit uses set disassembly-flavor intel.

    So it's noisy, 2 lines of start/end "of assembler dump" around every block, but that's not so bad when looking at blocks of multiple instructions. (GDB will disassemble to the end of an instruction is the disas range includes the first byte.)

    What matters is the sequence of instructions the CPU will execute from any given start point so I used a 16-byte range, rather than just 1 to see the instruction you get at a given start point.

    Of course this isn't filtering for sequences that end with ret or pop reg/jmp reg. Probably it's possible to do that with GDB commands. Or there are disassembler libraries like capstone and XED if you want to write a whole program to analyze a chunk of machine code you paste, or even to search bytes in executables and libraries, or in running processes.


    x86 machine code is a byte stream that's not self-synchronizing, but does decode uniquely from a given starting point. For typical use-cases, it's not useful to see how it would have decoded if a jump target address had been wrong. GDB doesn't have an option to do that.

    Despite not being truly self-synchronizing (you can't look at a byte and see if it's the start or end of an instruction or not), quite a few bytes are prefixes or opcodes for single-byte instructions, so re-sync usually happens within 2 to 10 bytes. This means the could be quite a bit of duplication in the simple GDB function I wrote; IDK if you'd want to filter that out or not. Probably not, just see all the different options that lead to a ret.