Search code examples
cqemupci

QEMU how pcie_host converts physical address to pcie address


I am learning the implementations of QEMU. Here I got a question: As we know, in real hardware, when cpu reads the virtual address which is the address of pci devices, pci host will take the responsibility to convert it to address of pci. And QEMU, provides pcie_host.c to imitate pcie host. In this file, pcie_mmcfg_data_write is implemented, but nothing about the conversion of physical address to pci address.

I do a test in QEMU using gdb:

  • firstly, I add edu device, which is a very simple pci device, into qemu.
  • When I try to open Memory Space Enable, (Mem- to Mem+):septic -s 00:02.0 04.b=2, qemu stop in function pcie_mmcfg_data_write.
static void pcie_mmcfg_data_write(void *opaque, hwaddr mmcfg_addr,
                                  uint64_t val, unsigned len)
{
    PCIExpressHost *e = opaque;
    PCIBus *s = e->pci.bus;
    PCIDevice *pci_dev = pcie_dev_find_by_mmcfg_addr(s, mmcfg_addr);
    uint32_t addr;
    uint32_t limit;

    if (!pci_dev) {
        return;
    }
    addr = PCIE_MMCFG_CONFOFFSET(mmcfg_addr);
    limit = pci_config_size(pci_dev);
    pci_host_config_write_common(pci_dev, addr, limit, val, len);
}

It is obvious that pcie host uses this function to find device and do the thing. Use bt can get:

#0  pcie_mmcfg_data_write
    (opaque=0xaaaaac573f10, mmcfg_addr=65540, val=2, len=1)
    at hw/pci/pcie_host.c:39
#1  0x0000aaaaaae4e8a8 in memory_region_write_accessor
    (mr=0xaaaaac574520, addr=65540, value=0xffffe14703e8, size=1, shift=0, mask=255, attrs=...) 
    at /home/mrzleo/Desktop/qemu/memory.c:483
#2  0x0000aaaaaae4eb14 in access_with_adjusted_size
    (addr=65540, value=0xffffe14703e8, size=1, access_size_min=1, access_size_max=4, access_fn=
    0xaaaaaae4e7c0 <memory_region_write_accessor>, mr=0xaaaaac574520, attrs=...) at /home/mrzleo/Desktop/qemu/memory.c:544
#3  0x0000aaaaaae51898 in memory_region_dispatch_write
    (mr=0xaaaaac574520, addr=65540, data=2, op=MO_8, attrs=...)
    at /home/mrzleo/Desktop/qemu/memory.c:1465
#4  0x0000aaaaaae72410 in io_writex
    (env=0xaaaaac6924e0, iotlbentry=0xffff000e9b00, mmu_idx=2, val=2, 
    addr=18446603336758132740, retaddr=281473269319356, op=MO_8)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1084
#5  0x0000aaaaaae74854 in store_helper
    (env=0xaaaaac6924e0, addr=18446603336758132740, val=2, oi=2, retaddr=281473269319356, op=MO_8) 
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1954
#6  0x0000aaaaaae74d78 in helper_ret_stb_mmu
    (env=0xaaaaac6924e0, addr=18446603336758132740, val=2 '\002', oi=2, retaddr=281473269319356) 
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:2056
#7  0x0000ffff9a3b47cc in code_gen_buffer ()
#8  0x0000aaaaaae8d484 in cpu_tb_exec
    (cpu=0xaaaaac688c00, itb=0xffff945691c0 <code_gen_buffer+5673332>)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:172
#9  0x0000aaaaaae8e4ec in cpu_loop_exec_tb
    (cpu=0xaaaaac688c00, tb=0xffff945691c0 <code_gen_buffer+5673332>, 
    last_tb=0xffffe1470b78, tb_exit=0xffffe1470b70)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:619
#10 0x0000aaaaaae8e830 in cpu_exec (cpu=0xaaaaac688c00)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:732
#11 0x0000aaaaaae3d43c in tcg_cpu_exec (cpu=0xaaaaac688c00)
    at /home/mrzleo/Desktop/qemu/cpus.c:1405
#12 0x0000aaaaaae3dd4c in qemu_tcg_cpu_thread_fn (arg=0xaaaaac688c00)
    at /home/mrzleo/Desktop/qemu/cpus.c:1713
#13 0x0000aaaaab722c70 in qemu_thread_start (args=0xaaaaac715be0)
    at util/qemu-thread-posix.c:519
#14 0x0000fffff5af84fc in start_thread (arg=0xffffffffe3ff)
    at pthread_create.c:477
#15 0x0000fffff5a5167c in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
  • and I try to visit the address of edu: devmem 0x10000000 qemu stop in edu_mmio_read. use bt:
(gdb) bt
#0  edu_mmio_read 
    (opaque=0xaaaaae71c560, addr=0, size=4) 
        at hw/misc/edu.c:187
#1  0x0000aaaaaae4e5b4 in memory_region_read_accessor
    (mr=0xaaaaae71ce50, addr=0, value=0xffffe2472438, size=4, shift=0, mask=4294967295, attrs=...)
    at /home/mrzleo/Desktop/qemu/memory.c:434
#2  0x0000aaaaaae4eb14 in access_with_adjusted_size
    (addr=0, value=0xffffe2472438, size=4, access_size_min=4, access_size_max=8, access_fn=
    0xaaaaaae4e570 <memory_region_read_accessor>, mr=0xaaaaae71ce50, attrs=...) 
    at /home/mrzleo/Desktop/qemu/memory.c:544
#3  0x0000aaaaaae51524 in memory_region_dispatch_read1 
(mr=0xaaaaae71ce50, addr=0, pval=0xffffe2472438, size=4, attrs=...)
    at /home/mrzleo/Desktop/qemu/memory.c:1385
#4  0x0000aaaaaae51600 in memory_region_dispatch_read 
(mr=0xaaaaae71ce50, addr=0, pval=0xffffe2472438, op=MO_32, attrs=...)
    at /home/mrzleo/Desktop/qemu/memory.c:1413
#5  0x0000aaaaaae72218 in io_readx
    (env=0xaaaaac6be0f0, iotlbentry=0xffff04282ec0, mmu_idx=0, 
    addr=281472901758976, retaddr=281473196263360, access_type=MMU_DATA_LOAD, op=MO_32) 
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1045
#6  0x0000aaaaaae738b0 in load_helper
    (env=0xaaaaac6be0f0, addr=281472901758976, oi=32, retaddr=281473196263360, 
    op=MO_32, code_read=false, full_load=0xaaaaaae73c68 <full_le_ldul_mmu>) 
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1566
#7  0x0000aaaaaae73ca4 in full_le_ldul_mmu 
(env=0xaaaaac6be0f0, addr=281472901758976, oi=32, retaddr=281473196263360)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1662
#8  0x0000aaaaaae73cd8 in helper_le_ldul_mmu 
(env=0xaaaaac6be0f0, addr=281472901758976, oi=32, retaddr=281473196263360)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cputlb.c:1669
#9  0x0000ffff95e08824 in code_gen_buffer 
()
#10 0x0000aaaaaae8d484 in cpu_tb_exec 
(cpu=0xaaaaac6b4810, itb=0xffff95e086c0 <code_gen_buffer+31491700>)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:172
#11 0x0000aaaaaae8e4ec in cpu_loop_exec_tb
    (cpu=0xaaaaac6b4810, tb=0xffff95e086c0 <code_gen_buffer+31491700>, 
    last_tb=0xffffe2472b78, tb_exit=0xffffe2472b70)
    at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:619
#12 0x0000aaaaaae8e830 in cpu_exec 
(cpu=0xaaaaac6b4810) at /home/mrzleo/Desktop/qemu/accel/tcg/cpu-exec.c:732
#13 0x0000aaaaaae3d43c in tcg_cpu_exec 
(cpu=0xaaaaac6b4810) at /home/mrzleo/Desktop/qemu/cpus.c:1405
#14 0x0000aaaaaae3dd4c in qemu_tcg_cpu_thread_fn 
(arg=0xaaaaac6b4810) 
    at /home/mrzleo/Desktop/qemu/cpus.c:1713
#15 0x0000aaaaab722c70 in qemu_thread_start (args=0xaaaaac541610) at util/qemu-thread-posix.c:519
#16 0x0000fffff5af84fc in start_thread (arg=0xffffffffe36f) at pthread_create.c:477
#17 0x0000fffff5a5167c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

It seems that qemu just locates to edu device directly, and pcie host do nothing in this procedure. I wonder whether qemu do not implements the conversion here and just use memoryRegion to achieve polymorphism? If not, how QEMU's pcie host do in this procedure?


Solution

  • QEMU uses a set of data structures called MemoryRegions to model the address space that a CPU sees (the detailed API is documented in part in the developer docs).

    MemoryRegions can be built up into a tree, where at the "root" there is one 'container' MR which covers the whole 64-bit address space the guest CPU can see, and then MRs for blocks of RAM, devices, etc are placed into that root MR at appropriate offsets. Child MRs can also be containers which in turn contain further MRs. You can then find the MR corresponding to a given guest physical address by walking through the tree of MRs.

    The tree of MemoryRegions is largely built up statically when QEMU starts (because most devices don't move around), but it can also be changed dynamically in response to guest software actions. In particular, PCI works this way. When the guest OS writes to a PCI device BAR (which is in PCI config space) this causes QEMU's PCI host controller emulation code to place the MR corresponding to the device's registers into the MemoryRegion hierarchy at the correct place and offset (depending on what address the guest wrote to the BAR, ie where it asked for it to be mapped). Once this is done, the MR for the PCI device is like any other in the tree, and the PCI host controller code doesn't need to be involved in guest accesses to it.

    As a performance optimisation, QEMU doesn't actually walk down a tree of MRs for every access. Instead, we first "flatten" the tree into a data structure (a FlatView) that directly says "for this range of addresses, it will be this MR; for this range; this MR", and so on. Secondly, QEMU's TLB structure can directly cache mappings from "guest virtual address" to "specific memory region". On first access it will do an emulated guest MMU page table walk to get from the guest virtual address to the guest physical address, and then it will look that physical address up in the FlatView to find either the real host RAM or the MemoryRegion that is mapped there, and it will add the "guest VA -> this MR" mapping to the TLB cache. Future accesses will hit in the TLB and need not repeat the work of converting to a physaddr and then finding the MR in the flatmap. This is what is happening in your backtrace -- the io_readx() function is passed the guest virtual address and also the relevant part of the TLB data structure, and it can then directly find the target MR and the offset within it, so it can call memory_region_dispatch_read() to dispatch the read request to that MR's read callback function. (If this was the first access, the initial "MMU walk + FlatView lookup" work will have just been done in load_helper() before it calls io_readx().)

    Obviously, all this caching also implies that QEMU tracks events which mean the cached data is no longer valid so we can throw it away (eg if the guest writes to the BAR again to unmap it or to map it somewhere else; or if the MMU settings or page tables are changed to alter the guest virtual-to-physical mapping).