Search code examples
linux-kerneldriverpcipci-ebus

What are the most common busmaster operations, and how are they better than regular DMA?


Can someone list the most common operations that use the bus mastering provision of the host bus? I can list a few..

1) The GPU transfers the overall framebuffer to the video card using bus-mastering over PCI-e (in recent x86).

2) The ethernet card transfers a received packet to main-memory using bus-mastering.

3) I assume the hard-disk too uses bus-mastering to transfer blocks.

In this context, when do these devices/drives use bus-mastering, vs 3rd party DMA?

Recently, it seems the linux kernel has started supporting something called, P2P DMA within the PCIe, where devices communicate directly among themselves. Now how is P2P DMA fundamentally different from the regular bus-mastering DMA. I guess, till now, bus-mastering was only used by the device to transfer to the buffer created by the DMA subsystem and it was always to or from the main-memory, right? P2P Dma is a provision that allows one to bypass the main memory altogether, I guess. I also read somewhere that such provisions were being used by some of the proprietary graphics drivers in high end gaming systems and that Linux is somewhat of a latecomer to the party.

Can someone provide a broad overview of the varieties of DMA available in modern systems, and some way to conceptually understand them, if there is one?

Edit: regular DMA changed to 3rd party DMA


Solution

  • TLDR: In modern tree-like high speed buses like PCI-express almost every device connected to it is capable of initiating memory access transactions (first-party DMA) for read and write from system memory. They are similar to bus master operations of ancient shared bus like PCI or ISA.

    Can someone list the most common operations that use the bus mastering provision of the host bus?

    In modern systems with high speed peripheral interface ("bus") like PCIe (which is tree, not bus) the most common operations are so called "Memory Write" and "Memory Read" operations, encoded in TLP packets. (Some examples there: http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1)

    In ancient common bus (ISA, Conventional PCI) with multiple devices connected to the electrically shared signals this may be different. But basic driver writing instructions will work, for example from the LDD book, the Linux Device Drivers: https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch13s04.html

    1) The GPU transfers the overall framebuffer to the video card using bus-mastering over PCI-e (in recent x86).

    In recent videocards and GPUs with PCIe interface (native PCIe or PCIe made with integrated PCI to PCIe, or AGP to PCIe bridge chip) framebuffer is integrated into the GPU itself (https://en.wikipedia.org/wiki/Framebuffer "Modern video cards contain framebuffer circuitry in their cores."). GPU have no need to access bus to output the frame, GPU is the video card. (GPU has circuits to convert current frame image from internal GPU memory to the video signal: RAMDAC for VGA or DVI-A/-I analog signal; RAM to TMDS or LVDS encoder for DVI-D, HDMI and DisplayPort.)

    2) The ethernet card transfers a received packet to main-memory using bus-mastering.

    Yes, Ethernet controller (NIC) in ethernet card will write received packets to system memory using some descriptors and rings, which were set up by network driver in OS. It also will execute transmit descriptors written by driver, and to transmit packet it will read packet headers and packet data from system memory. Both operations are DMA, and the DMA engine for the operations is in the network controller (for modern PCIe and PCI cards).

    3) I assume the hard-disk too uses bus-mastering to transfer blocks.

    Hard disk is not connected to any bus with bus mastering (real or emulated). Usually hard disk has PATA, SATA, or SCSI interface which is connected to some hard disk controller or HBA, like PCI-to-SATA, PCIe-to-SATA some controller integrated into south bridge or other types. The controller will use DMA (and bus-mastering) to execute read and write descriptors out of rings. Data for write descriptors will be read by DMA read from system memory; data read from disk will be DMA written into system memory. The DMA engine for the operations is in the disk controller.

    NVMe (NVM express) Solid state disk (SSD) are directly connected to the PCIe bus and nvme controller in every such disk is capable of doing DMA reads and writes, programmed with nvme descriptors in nvme queues by nvme driver. (Overview of nvme: page 7 for queues, page 10 SQ for I/O commands; page 21 - controller registers "doorbell" which are written by driver to notify nvme controller that new commands were posted to queue, page 45 for read io command, page 73 for write command).

    In this context, when do these devices/drives use bus-mastering, vs 3rd party DMA?

    In context of modern PCIe systems every device connected to bus is directly connected only to single port with high-speed full duplex channel (directly to host PCIe port or to some bridge or switch). There is no electrical signal sharing like in old PCI, and there is no need to call any operation as "bus mastering". When device wants to read or write system (or any other) memory device can send Memory read or write packets (TLP) to the interface. To do something useful devices have their own integrated DMA engines and the driver will command device controller to execute operations to the right memory addresses (allocated by driver and given in command descriptors).

    bus-mastering was only used by the device to transfer to the buffer created by the DMA subsystem and it was always to or from the main-memory, right?

    Yes, usually drivers uses main (system) memory as target of DMA operations.

    P2P Dma is a provision that allows one to bypass the main memory altogether

    P2P DMA is very new and rare feature. It can be used with small count of high cost special high-speed PCI-express controllers, but probably not in typical desktop system.

    In P2P (peer to peer) dma, special driver may point one device to do DMA operation to the memory resources of another (compatible) device on the same PCI-express hierarchy. This p2p DMA can be useful to send large data. System memory can be used to store some metadata or descriptors, so it is like 95% or 99% of data send with P2P dma and 5% or 1% of metadata still stored in system memory.

    For example, there were proposed drivers which can copy data from one nvme ssd to another or to compute GPU with p2p dma: * https://www.snia.org/sites/default/files/SDC15_presentations/nvme_fab/StephenBates_Donard_NVM_Express_Peer-2_Peer.pdf * https://www.usenix.org/sites/default/files/conference/protected-files/atc17_slides_bergman.pdf * https://lwn.net/Articles/764716/ (Peer-to-Peer PCI memory in the kernel with initial support for the NVMe fabrics target subsystem)

    Some info on CMB and PMR buffers: * https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_SOFT-201-1_Bates.pdf * https://nvmexpress.org/wp-content/uploads/Session-2-Enabling-the-NVMe-CMB-and-PMR-Ecosystem-Eideticom-and-Mell....pdf

    PCIe switch (as I understand) will be configured to detect the reciever of p2p memory read or write operation (TLP) and to send the TLP packet directly to the right device, not to the host port. Memory access R/W (DMA) TLPs are probably routed by the access address ("Table 3-5. PCI Express TLP Variants And Routing Options").

    Generic p2p dma description and some issues of p2p dma activation: https://spdk.io/doc/peer_2_peer.html

    Peer-2-Peer (P2P) is the concept of DMAing data directly from one PCI End Point (EP) to another without using a system memory buffer. The most obvious example of this from an SPDK perspective is using a NVMe Controller Memory Buffer (CMB) to enable direct copies of data between two NVMe SSDs. In some systems when performing peer-2-peer DMAs between PCIe EPs that are directly connected to the Root Complex (RC) the DMA may fail or the performance may not be great. Basically your milage may vary. It is recommended that you use a PCIe switch (such as those provided by Broadcom or Microsemi) as that is know to provide good performance.

    Some linux kernel documentation for p2p dma support in drivers: https://www.kernel.org/doc/html/latest/driver-api/pci/p2pdma.html

    The PCI bus has pretty decent support for performing DMA transfers between two devices on the bus. This type of transaction is henceforth called Peer-to-Peer (or P2P). However, there are a number of issues that make P2P transactions tricky to do in a perfectly safe way. Therefore, as of this writing, the kernel only supports doing P2P when the endpoints involved are all behind the same PCI bridge, as such devices are all in the same PCI hierarchy domain, and the spec guarantees that all transactions within the hierarchy will be routable