How is PTRACE_SINGLESTEP implemented?

To the best of my knowledge (I could be wrong), there's no way to just execute one instruction on an x86-64 system. Perhaps instead you could execute the instruction followed by the 'ud2' opcode to trigger a signal -- but then you have to worry about the instruction modifying control flow and going somewhere else.

Yet, if I understand correctly, the ptrace() syscall has a SINGLESTEP option that will execute only a single instruction. How is this implemented? I can't imagine the kernel has some kind of disassembler to identify the instruction and reason about it. So, is there some kind of architectural feature it's using that I don't know about? Or something entirely different?

Solution

Yes, there's an architectural single-step flag on x86. Returning from kernel to user-space gives the kernel a chance to set both RIP/RFLAGS at the same time, so it can set the single-step for user-space without having it trigger on a kernel instruction.

For some reason, the Trap Flag has its own wikipedia article! See also wikipedia's EFLAGS article.

See the x86 tag wiki for links to Intel's architecture manuals which document all of this.

Perhaps instead you could execute the instruction followed by the 'ud2' opcode to trigger a signal

Then you'd need code to determine x86 instruction lengths, to know where to set a software breakpoint. And you wouldn't use ud2, you'd use int3 which exists for this purpose.

x86 also has debug registers (dr0..7) which can set hardware breakpoints without modifying the code, or can monitor for access or write to a given data address. (GDB hbreak uses those, as do GDB watchpoints on constant addresses)

But for jump/call/ret and other instructions that might have a special effect on RIP, you'd need to decode and emulate to figure out the destination to put an int3 at the destination. A memory-indirect jump using an addressing mode like jmp qword [fs: rax] would require the debugger to know the FS segment base to even know what address it will load a pointer from. (I assume you can get this with ptrace as easily as actual register values, unlike inside the guest program itself rdfsbase is a new extension.) So it's possible as long as your debugger has stopped all other threads so you can't have a TOCTOU race condition with another thread modifying the jump target pointer between reading it and continuing execution.

Fun fact: not all ISAs have hardware support for PTRACE_SINGLESTEP.

Case in point, the Linux kernel used to emulate it for ARM, but that required an ARM disassembler in the kernel to place a breakpoint at the next instruction, even if a branch target. It was removed in ~2011; now ptrace(PTRACE_SINGLESTEP) returns -ENOSYS on ARM.

They just ripped out all that complexity instead of trying to make it SMP-safe and support every new instruction like Thumb-2 and so on. (http://lists.infradead.org/pipermail/linux-arm-kernel/2011-February/041324.html)

So debuggers have to manually use breakpoints on such ISAs instead of having the kernel do it for them. If that means other threads notice a debug-break opcode in memory temporarily, that's not the kernel's problem. (Normally debuggers like GDB do stop all threads while you're single-stepping.)

And it means debuggers will have to decode branch instructions to figure out where to put the breakpoint. Including register-indirect and/or predicated branches.