VHDL warning: PAR will not attempt to route this signal

I am learning VHDL and I am on the quest to implement my own FIFO buffer, but I have some problems. Since I want to deploy the code on a Xilinx Spartan 6 device I am using the Xilinx WebPack ISE with the associated VHDL compiler, but I am getting very weird warnings:

WARNING:Par:288 - The signal Mram_buf_mem1_RAMD_D1_O has no load. PAR will not attempt to route this signal.

WARNING:Par:283 - There are 1 loadless signals in this design. This design will cause Bitgen to issue DRC warnings.

Here is my code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity FIFO_buffer is
    generic ( BUFFER_SIZE : positive := 4; -- # of words
              WORD_WIDTH : positive := 8); -- # of bits per word

    port ( data_in : in  STD_LOGIC_VECTOR (WORD_WIDTH - 1 downto 0);
           full : out  STD_LOGIC := '0';
           write : in  STD_LOGIC;
           data_out : out  STD_LOGIC_VECTOR (WORD_WIDTH - 1 downto 0);
           empty : out  STD_LOGIC := '1';
           read : in  STD_LOGIC);
end FIFO_buffer;

architecture arch of FIFO_buffer is
type ram_t is array (0 to BUFFER_SIZE - 1) of std_logic_vector(WORD_WIDTH - 1 downto 0); 
signal buf_mem : ram_t := (others => (others=>'0'));

signal read_idx : integer range 0 to BUFFER_SIZE  - 1 := 0;
signal write_idx : integer range 0 to BUFFER_SIZE - 1 := 0;
signal buf_full : std_logic := '0';
signal buf_empty : std_logic := '0';

begin
    writing_data: process(write)
    begin
        if(rising_edge(write)) then
            if(buf_full = '0') then
                buf_mem(write_idx) <= data_in;

                write_idx <= write_idx + 1;

                if(write_idx = read_idx) 
                    then buf_full <= '1';
                    else buf_full <= '0'; 
                end if;
            end if;
        end if;
    end process;

    reading_data: process(read)
    begin
        if(rising_edge(read)) then
            if(buf_empty = '0') then
                data_out <= buf_mem(read_idx);

                read_idx <= read_idx + 1;

                if(read_idx = write_idx) 
                    then buf_empty <= '1'; 
                    else buf_empty <= '0'; 
                end if;
            end if; 
        end if;
    end process;

    full <= buf_full;
    empty <= buf_empty;

end arch;

The error seems to be caused by the data_out <= buf_mem(read_idx); line in the reading_data process. Could anyone explain to me the reason for the warning? (I know that my code has some functional problems, but that should not affect the reason for the warning)

P.S. Since I have the code here let me ask one more question. How unwise is it to have a component (such as that FIFO buffer) which is not synchronised with the global clock?

Solution

I'll address your second question first, i.e. " How unwise is it to have a component (such as that FIFO buffer) which is not synchronised with the global clock?"

It depends on your requirements. Usually, you should clock your components, so you have synchronous logic and no weird glitches caused by asynchronous paths.

However, consider what you did here. You have clocked your component: rising_edge(read) and rising_edge(write). You will find in your synthesis report the following:

Primitive and Black Box Usage:
------------------------------
<snip>
# Clock Buffers                    : 2
#      BUFGP                       : 2
<snip>

Clock Information:
------------------
-----------------------------------+------------------------+-------+
Clock Signal                       | Clock buffer(FF name)  | Load  |
-----------------------------------+------------------------+-------+
read                               | BUFGP                  | 11    |
write                              | BUFGP                  | 6     |
-----------------------------------+------------------------+-------+

This is because you're not using a combinational process. This will lead to all kinds of problems. You mention a Xilinx Spartan-6. You will get the following message along the line (usually an ERROR), assuming you did not accidentally place read and write at an optimal IOB/BUFG site pair:

Place:1109 - A clock IOB / BUFGMUX clock component pair have been found
that are not placed at an optimal clock IOB / BUFGMUX site pair. The clock
IOB component <write> is placed at site <A5>. The corresponding BUFG
component <write_BUFGP/BUFG> is placed at site <BUFGMUX_X2Y9>. There is only
a select set of IOBs that can use the fast path to the Clocker buffer, and
they are not being used. You may want to analyze why this problem exists and
correct it.

What this message explains in great verbosity is the following. FPGAs have dedicated routing networks for clocks, which assure low skew. (Check Xilinx UG382 for more). However, there are specific pins on the FPGA that can directly access this clock network. There, IOB (I/O Buffer) and BUFG[MUX] ([Multiplexed] Global [Clock] Buffer) are close-by, ensuring that the signal from the pin can be distributed really fast across the whole FPGA using dedicated clocking resources. You can check placement with the FPGA Editor. For instance, my write pin has to cross half the FPGA before being able to get routed using a global clock buffer. That's 3.878ns delay in my case.

The same applies for read, of course. So you see this is a bad idea. You should use dedicated clocking resources for your clocks and synchronize inputs and outputs to that.

Now, on to your main question. You have to be aware what your HDL actually describes. You have two distinct processes, each with their own clock (read; write) that access the same memory. You have two distinct addresses as well (write_idx; read_idx).

Hence, the XST Synthesizer (that ISE uses) inferred a dual-port RAM. Because the depth as well as element width are both small, it inferred a distributed dual-port RAM. Check your synthesis report, it will say

Found 4x8-bit dual-port RAM <Mram_buf_mem> for signal <buf_mem>.
<snip>
INFO:Xst:3231 - The small RAM <Mram_buf_mem> will be implemented on LUTs in order to maximize performance and save block RAM resources. If you want to force its implementation on block, use option/constraint ram_style.
-----------------------------------------------------------------------
| ram_type           | Distributed                         |          |
-----------------------------------------------------------------------
| Port A                                                              |
|     aspect ratio   | 4-word x 8-bit                      |          |
|     clkA           | connected to signal <write>         | rise     |
|     weA            | connected to signal <full>          | low      |
|     addrA          | connected to signal <write_idx>     |          |
|     diA            | connected to signal <data_in>       |          |
-----------------------------------------------------------------------
| Port B                                                              |
|     aspect ratio   | 4-word x 8-bit                      |          |
|     addrB          | connected to signal <read_idx>      |          |
|     doB            | connected to internal node          |          |
-----------------------------------------------------------------------

When you now look at the technology schematic, you will see XST inferred three instances: Mram_buf_mem1, Mram_buf_mem21, Mram_buf_mem22. In my case anyway, yours might differ.

Mram_buf_mem1 is the input buffer for data_in(5:0), data_in(6) and data_in(7) are actually using Mram_buf_mem21 resp. Mram_buf_mem22. This is just an artifact of the design not being properly constrained (what's the clock period of read and write? etc.)

So, basically your message above

WARNING:Par:288 - The signal Mram_buf_mem1_RAMD_D1_O has no load. PAR will not attempt to route this signal.

means that some output signal the inferred dual-port distributed RAM provides (D1_O) is not being used (it drives no logic/flip flops). Therefore, the Place and Route (PAR) step will not even attempt to route it. With all this information we gathered, we can safely assume that this doesn't matter and won't affect your FIFO at all.

However, what will matter is the following: You did nothing to constrain paths between your two clock domains (read domain and write domain). This means, you might run into issues where write_idx is changing while read is performed and vice-versa. This might leave you stuck in one state with full not being deasserted or empty not being asserted, because you lack synchronization logic for signals that need to cross the clock domain. XST will not insert this logic for you. You can check for these types of errors using the Asynchronous Delay Report and Clock Region Report.

Now, if you're just getting started with the world of FPGAs, you might want to play around a bit with inference of primitives vs. instantiation of primitives. Check the Spartan 6 HDL library guide to see what VHDL language construct will cause XST to infer a e.g. RAM, FIFO, flip flop, and which constructs will cause it to infer weird and cryptic logic constructs because of some unrealistic inferred timing/area constraints.

Finally, try to have synchronous logic as much as possible and properly constrain your design. Also, sorry for the long write-up if you were just looking for an easy two-liner...