There are a few scenarios I'm curious about:
Will all these scenarios be limited to the total number of PCIe lanes supported by the CPU? For Intel systems, ARM systems?
Will all these scenarios be limited to the total number of PCIe lanes supported by the CPU?
PCIe is not precisely a bus -- certainly not in the way that PCI or ISA were, for instance. It's a set of point-to-point connections between peripherals and the PCIe root complex (which is usually the CPU itself). Any given root complex will support some fixed number of PCIe lanes, each of which is connected to one device. (Often in sets. For instance, it's typical to connect 16 PCIe lanes to most GPUs.)
So, yes. Any communications between PCIe devices, or between devices and memory, must pass through the CPU, and will be limited by the number of PCIe lanes the device (or devices) have connecting them to the bus master.