GDB watch DMA controller memory access on x86

I run FreeBSD-based kernel as QEMU/KVM guest.

I'm working on a FreeBSD-based OS kernel SCSI driver and have an issue with read system call produces corrupted data.

To troubleshoot the problem I use the Kernel running in QEMU and would like to trace memory access performed by the DMA controller which is responsible for delivering data into the user-supplied buffer. In case of QEMU The controller is QEMU SCSI/ATA Disk device. So I tried to set a watchpoint on a user supplied buffer

Example:

Setting the breakpoint on int sys_read(struct thread *td, struct read_args *uap) I got some buffer arrived from the user:

(gdb) p uap->buf
 $5 = 0x7ffd4593f000 "User buffer initial data"
(gdb) watch *0x7ffd4593f000
 Hardware watchpoint 7: *0x7ffd4593f000
(gdb) c

The problem with that is the watchpoint is never hit. Why? I'd like to use it to understand examine data that are being transferred from the device into the memory.

Is it ever possible to watch access from the DMA controller?

UPD:

I managed to hit the watchpoint. It looks as follows:

Thread 3 hit Hardware watchpoint 7: *0x7ffd4593f000                                                                                                                                                                
                                                                                                                                                                                                                   
Old value = 1750343715                                                                                                                                                                                             
New value = 1819043144                                                                                                                                                                                             
0x00005586ec68f146 in ?? ()                                                                                                                                                                                        
(gdb) bt                                                                                                                                                                                                           
#0  0x00005586ec68f146 in ?? ()
#1  0x654b206f6c6c6548 in ?? ()
#2  0x6f7266206c656e72 in ?? ()
#3  0x707372657375206d in ?? ()
#4  0x6574617200656361 in ?? ()
#5  0x6d69562079622064 in ?? ()
#6  0x20230a2e312e3820 in ?? ()
#7  0x2079616d20756f59 in ?? ()
#8  0x2074692074696465 in ?? ()
#9  0x7227756f79206669 in ?? ()
...
#509 0x0a632e6e69616d2f in ?? ()
#510 0x32312c30352c347c in ?? ()
#511 0x323436312c37312c in ?? ()
#512 0x222c323037343132 in ?? ()
#513 0x0000000000000000 in ?? ()

It's highly likely to be correct, since the first 4 bytes shown as Old value and New value matches to what I expect to be read.

But the strange thing is that it is hit only once upon QEMU startup. Subsequent read system calls does not trigger the watchpoint. In order for it to be hit I restart QEMU and set it again.

What may this stacktrace mean?

Solution

The watchpoint handling provided for QEMU's gdbstub is really only intended to work with accesses done by the guest CPU, not for those done by DMA from emulated devices. I'm surprised it hits at all, and not surprised that the behaviour is a bit weird.

If you can repro this on a QEMU setup using purely emulation (ie not KVM, hvf or whpx), then my suggestion for debugging this would be to run QEMU itself in a host gdb, and use the host gdb to set a watchpoint on the host memory that corresponds to the relevant bit of guest memory. Unfortunately that requires some knowledge of QEMU internals to find the host memory, and generally to understand what's going on and relate what QEMU is doing to what the guest execution is.

Supplementary debugging tip: if you can take a 'snapshot' just before the bug is triggered, that gives you a shorter reproduce case which is "load from snapshot and trigger bug" rather than "boot entire guest OS and userspace then trigger bug". More detail in this blog post.

Supplementary debugging tip 2: if you take the "debug QEMU with host gdb" approach, you can use the reverse-debugger rr, which is very handy for memory-corruption bugs, because you can say "now execute backwards to whatever last touched this memory". More info in this post.