Xilinx, Zynq, AXI4 interconnect. What are the performance implications of configuring register slice and data fifo options?

Consider an AXI4 Interconnect on the PL (FPGA) side.

When I double click to see the available options, there is a tab in Slave interfaces. Containing the following options.

What is the purpose of enabling register slice? Does outer refer to the L2 cache? And what does Auto mean?

What is the purpose of enabling the Data FIFO? For burst transactions? Doesn't the DMA controller have it's own FIFO?

Solution

Register Slices

Enabling Register slices (AXI Interconnect v2.1 - pg. 93) basically creates a pipeline between your AXI master and slave connections to break a critical timing path. It does not seem to have anything related to L2 cache. The available options provide you with the following (pg. 113):

If None (0) is selected, no register slice is inserted.
If Outer (1) is selected, a register slice is inserted at the SI side of the SI coupler cells hierarchy.
If Auto (2) is selected, a register slice is automatically inserted in the SI coupler cells hierarchy if SI coupler cells with common timing paths are detected.
If Outer and Auto (3) is selected, a register slice is inserted at the SI side of the SI coupler cells hierarchy and an additional register slice can be inserted if SI coupler cells with common timing paths are detected.

AXI Reference Guide provides a guideline for AXI System Optimization in page 91. For example, it states that: "Large and complex IP blocks such as processors, DDR3 memory controllers, and PCIe bridges are good candidates for having register slices enabled. The register slice breaks timing paths and allows more freedom for Place and Route (PAR) tools to move a large IP block away from the congestion of the interconnect core and other IP logic". Nonetheless, I recommend reading that whole section as excessive use of register slices may be counterproductive." and it really depends on your system design.

Data FIFO

The purpose of enabling Data FIFO is to provide data buffering and enable higher throughput. The 32 deep mode option provides a 32-deep LUT-RAM based FIFO (data channel only), while 512 deep (packet mode) provides a 512-deep block RAM based Packet FIFO. The Packet FIFO mode provides an additional 32-deep FIFO on the correspoding address channel to to avoid full/empty stalls in the middle of bursts. This basically adds a delay to read/write operations to avoid stalls. Read the following pages for more in depth information. (AXI Interconnect v2.1 - pg. 94). The options provide the following:

If parameter value is 0, no data FIFO is inserted.
If parameter value is 1, a 32-deep data FIFO is inserted.
If parameter value is 2, a 512-deep data FIFO is inserted and its packet mode feature is enabled.

Finally, I don't know the exacts of Xilinx's DMA implementation but I believe the intent of including a buffer there would be if your receiving module was not as fast as your DMA. That is, the DMA can provide more data than your module can read, thus buffering its output could enhance communication speed(and release your DMA faster in some cases).