Search code examples
x86cpu-architecturememory-barrierspci-e

PCIe ordering rules and x86, how are they compatible?


PCIe specs express clearly what are the ordering rules.

  • A Posted Request must not pass another Posted Request
  • A Posted Request must be able to pass Non-Posted Requests to avoid deadlocks

It means any writes can be executed before an earlier read. This rule seems to break the x86 ordering rules that are strongly ordered.

What do I miss?

Reference to the PCIe specs:

enter image description here

The second transaction (row) must be allowed to pass the first (column) to avoid deadlock.


Solution

  • PCIe agents can do Posted vs Non-Posted reordering for two main reasons:

    1. Avoid deadlocks
    2. Improve resource utilization (bandwidth, receiving buffers, and so on)

    The second bullet is straightforward to understand: an NVMe may let a write pass a read if the two are to different addresses and latching/caching the write is faster/easier.
    Of course, in general, reordering writes and reads is not possible, not even for system memory, let alone when accessing IO devices where even reads can have side effects (e.g. clearing status bits).

    However, devices (more than switches, which must obey the Relaxed Order bit) are free to implement their own optimizations.

    Switches and bridges may find it necessary to reorder transactions when they do complex mappings. This brings us to the first bullet above: deadlocks.
    I've never really dug into this topic, so this is all handwaved.
    Some devices may need to perform other transactions before they can handle (i.e. complete) the transaction they just received. The PCIe specifications give these examples:

    • Bridges to complex protocols that require state to be save/restored to/from host memory, i.e., PCI Express to Infiniband bridges.
    • Messaging engines that must do address translation based upon page tables stored in host memory.
    • UMA graphics devices that store their frame buffer in host memory.

    For example, an MMIO may read its page table entries from a secure memory area hosted on an ad-hoc PCIe device. So upon receiving a downstream memory read the MMIO issues another read request to the secure memory. In turn, this secure memory device may need to generate an MSI (due to a missing entry, for example) and thus go back through the MMIO which is still waiting for its request to finish and thus deadlocking.

    This is a contrived example for sure, but the general idea is that PCIe topology and functionality are complex enough to generate deadlocks and the PCIe ordering rules allow solving some of these deadlocks.
    Surely the protocol semantic holds, if you read from address X and then write to address X then you can be sure that any agent will obey "as-if" causality.

    Regarding the x86 memory model, it has nothing to do with this. At least directly. The x86 memory model determines the order and the type of requests that arrive at the Root Complex. It's your responsibility to make sure these requests arrive in the correct order (e.g. by using UC MMIO) or not (e.g. by using WC MMIO).
    Once they are at the root complex, the PCIe protocol will follow its rule.