I've been having this debate for years... What's the correct why to infer a single port ram with synchronous read.
Let's Suppose the interface for my inferred memory in VHDL is:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity sram1 is
generic(
aw :integer := 8; --address width of memory
dw :integer := 8 --data width of memory
);
port(
--arm clock
aclk :in std_logic;
aclear :in std_logic;
waddr :in std_logic_vector(aw-1 downto 0);
wdata :in std_logic_vector(dw-1 downto 0);
wen :in std_logic;
raddr :in std_logic_vector(aw-1 downto 0);
rdata :out std_logic_vector(dw-1 downto 0)
);
end entity;
is this this way: Door #1
-- I LIKE THIS ONE
architecture rtl of sram1 is
constant mem_len :integer := 2**aw;
type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);
signal block_ram : mem_type := (others => (others => '0'));
begin
process(aclk)
begin
if (rising_edge(aclk)) then
if (wen = '1') then
block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
end if;
-- QUESTION: REGISTERING THE READ DATA (ALL OUTPUT REGISTERED)?
rdata <= block_ram(to_integer(unsigned(raddr)));
end if;
end process;
end architecture;
Or this way: Door #2
-- TEXTBOOKS LIKE THIS ONE
architecture rtl of sram1 is
constant mem_len :integer := 2**aw;
type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);
signal block_ram : mem_type := (others => (others => '0'));
signal raddr_dff : std_logic_vector(aw-1 downto 0);
begin
process(aclk)
begin
if (rising_edge(aclk)) then
if (wen = '1') then
block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
end if;
-- QUESTION: REGISTERING THE READ ADDRESS?
raddr_dff <= raddr;
end if;
end process;
-- QUESTION: HOT ADDRESS SELECTION OF DATA
rdata <= block_ram(to_integer(unsigned(raddr_dff)));
end architecture;
I'm a fan of the first version because I think its good practice to register all of the output of your vhdl module. However, many textbook list the later version as the correct way to infer a single port ram with synchronous read.
Does it really matter from a Xilinx or Altera synthesis point of view, as long as you already have taken into account the different between delaying the data verses the address (and determined it doesn't matter for your application.)
I mean...they both still give you block rams in the FPGA? right?
or does one give you LUTS and the other Block rams?
Which would infer a better timing and better capacity in an FPGA, door #1 or door #2?
Unfortunately, the synthesis tool vendors have made the RAM inference functions so that they typically recognize both styles, regardless of the physical implementation of the RAM in the FPGA in question. So even if you specify registered output, the syntesis tool may silently ignore that and infer a RAM with registered inputs instead. This is not functionally equivalent, so it may actually lead to undesired behaviour, particularly in the case of dual port RAMs.
To avoid this pitfall, you can add vendor specific attributes telling the syntehsis tool exactly which kind of RAM you need.
In general, most FPGAs have mandatory registered inputs on the physical RAM, and can add a additional optional register on the output. So using the code style code with registered inputs will probably make simulation match reality, which is typically a good thing.