PCIe specs express clearly what are the ordering rules.
It means any writes can be executed before an earlier read. This rule seems to break the x86 ordering rules that are strongly ordered.
What do I miss?
Reference to the PCIe specs:
The second transaction (row) must be allowed to pass the first (column) to avoid deadlock.
PCIe agents can do Posted vs Non-Posted reordering for two main reasons:
The second bullet is straightforward to understand: an NVMe may let a write pass a read if the two are to different addresses and latching/caching the write is faster/easier.
Of course, in general, reordering writes and reads is not possible, not even for system memory, let alone when accessing IO devices where even reads can have side effects (e.g. clearing status bits).
However, devices (more than switches, which must obey the Relaxed Order bit) are free to implement their own optimizations.
Switches and bridges may find it necessary to reorder transactions when they do complex mappings. This brings us to the first bullet above: deadlocks.
I've never really dug into this topic, so this is all handwaved.
Some devices may need to perform other transactions before they can handle (i.e. complete) the transaction they just received. The PCIe specifications give these examples:
For example, an MMIO may read its page table entries from a secure memory area hosted on an ad-hoc PCIe device. So upon receiving a downstream memory read the MMIO issues another read request to the secure memory. In turn, this secure memory device may need to generate an MSI (due to a missing entry, for example) and thus go back through the MMIO which is still waiting for its request to finish and thus deadlocking.
This is a contrived example for sure, but the general idea is that PCIe topology and functionality are complex enough to generate deadlocks and the PCIe ordering rules allow solving some of these deadlocks.
Surely the protocol semantic holds, if you read from address X and then write to address X then you can be sure that any agent will obey "as-if" causality.
Regarding the x86 memory model, it has nothing to do with this. At least directly. The x86 memory model determines the order and the type of requests that arrive at the Root Complex. It's your responsibility to make sure these requests arrive in the correct order (e.g. by using UC MMIO) or not (e.g. by using WC MMIO).
Once they are at the root complex, the PCIe protocol will follow its rule.