What is the advantage of direct memory access?

If the CPU cannot access the bus during DMA transfers so must wait on the bus to be released, how does using DMA provide any performance advantage?

Solution

There may be bus contention, but it is still likely to be faster than the alternative which is typically interrupt driven I/O which has a huge software overhead. Typically the DMA transfer will not use the maximum memory bandwidth (except for memory-to-memory transfers perhaps) because peripheral devices tend to be far slower than the memory. The DMA controller will interleave CPU access.

Moreover bus contention is not a given, many microcontrollers segment their memory on separate buses so access by the CPU to memory on one bus can occur concurrently with DMA operations in memory on another. For example Harvard architecture devices have independent buses for code and data memory, so can fetch op-codes and data simultaneously, but the reduced access load on the data memory (because not all instructions require memory operands to be fetched) advantages DMA.

Further, some microcontrollers such as many in the STM32 series (Cortex-M so also Harvard architecture) segment their on-chip SRAM on separate buses (as well as an external memory interface on a separate bus). Typically there will be some segment of SRAM smaller than the other segments - 4K or 16K perhaps - that is ideal for exclusive use for DMA buffering. Using the memory in this way can have a significant impact on performance by avoiding bus contention.

Another feature of some STM32 parts is core coupled memory (CCM) which is not available for DMA or bit-banding, but if used for most CPU data processing, leaves other on-chip SRAM available for DMA further reducing bus contention.

Other microprocessors may have similar memory segmentation. To take advantage of such memory architecture you typically require a custom linker script and possibly linker directives in the code, to allow segments to be identified and objects to be explicitly assigned to them. This will be toolchain specific.