I have designed 2 FSMs for CRC purposes. I got the base code (xor tree) from an online CRC generator, and build around it the FSMs, one for Tx and one for Rx. It works great. When I test either, ON ITS OWN, i get 200+ MHz speed. When I try to test them, back to back, my speed drops significantly (bellow 150 MHz). When I include them in the bigger design with a UART link, it drops even lower (110 MHz). I guess I am missing something very important, but have no idea what. Do you? I have included the code for the Tx one. The Rx is very similar. And just to be more precise, the limiting factor when they are both tested is, frame-to-output(Tx) to curr_state(Rx). I should also say that i have recently started dealing with vhdl, so please point out any stupid mistakes i have in the design bellow. (ps. FSM state vector encoding is for another discussion, but any input will be more than welcome)
library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;
entity Append_Tx_FSM is
port (
CLK : in std_logic; -- system clock
RESETn : in std_logic; -- global reset
APPEND_CRC : in std_logic; -- input flag
TX_FRAME_NO_CRC : in std_logic_vector(39 downto 0); -- 40-bit input frame to attach CRC
CRC_APPENDED : out std_logic; -- output flag
TX_FRAME : out std_logic_vector(47 downto 0) -- 48-bit output frame
);
end Append_Tx_FSM;
--------------------------------------------------------------------------------
architecture arch of Append_Tx_FSM is
--------------------------------------------------------------------------------
-- Finite State Machine declaration
--------------------------------------------------------------------------------
TYPE State IS (idle_st, delay_crc, result)
signal curr_st, next_st : State;
ATTRIBUTE syn_encoding : STRING;
ATTRIBUTE syn_encoding of curr_st : signal is "gray";
--ATTRIBUTE syn_state_machine : boolean;
--ATTRIBUTE syn_state_machine of curr_tx_st : signal is false;
-- state vector
signal crc_result_tx : std_logic_vector (7 downto 0); -- signal for crc computation
signal last_append : std_logic; -- signal for last value of input flag
signal frame_to_append : std_logic_vector (39 downto 0); -- signal for frame construction
signal frame_to_output : std_logic_vector (47 downto 0); -- signal for output
signal appended_crc : std_logic; -- signal for output
--------------------------------------------------------------------------------
begin
--------------------------------------------------------------------------------
-- CRC computation for input data(39:0) Polynomial (1+x^1+x^2+x^3+x^5+x^8) (0x97)
--------------------------------------------------------------------------------
CRC_RESULT_TX(0) <= '0' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(11) xor TX_FRAME_NO_CRC(12) xor TX_FRAME_NO_CRC(15) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(25) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(31) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(38);
CRC_RESULT_TX(1) <= '1' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(13) xor TX_FRAME_NO_CRC(15) xor TX_FRAME_NO_CRC(16) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(26) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(39);
CRC_RESULT_TX(2) <= '0' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(11) xor TX_FRAME_NO_CRC(12) xor TX_FRAME_NO_CRC(14) xor TX_FRAME_NO_CRC(15) xor TX_FRAME_NO_CRC(16) xor TX_FRAME_NO_CRC(17) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(25) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(38);
CRC_RESULT_TX(3) <= '0' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(13) xor TX_FRAME_NO_CRC(16) xor TX_FRAME_NO_CRC(17) xor TX_FRAME_NO_CRC(18) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(26) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(39);
CRC_RESULT_TX(4) <= '1' xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(14) xor TX_FRAME_NO_CRC(17) xor TX_FRAME_NO_CRC(18) xor TX_FRAME_NO_CRC(19) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(38);
CRC_RESULT_TX(5) <= '1' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(12) xor TX_FRAME_NO_CRC(18) xor TX_FRAME_NO_CRC(19) xor TX_FRAME_NO_CRC(20) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(39);
CRC_RESULT_TX(6) <= '0' xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(13) xor TX_FRAME_NO_CRC(19) xor TX_FRAME_NO_CRC(20) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(25) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(31) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36);
CRC_RESULT_TX(7) <= '1' xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(11) xor TX_FRAME_NO_CRC(14) xor TX_FRAME_NO_CRC(20) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(26) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(31) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37);
--------------------------------------------------------------------------------
-- At the desired clock edge, load the next state
--------------------------------------------------------------------------------
CurStDecode_RX:process (CLK, RESETn)
begin
-- Clear FSM to start state
if (RESETn = '0') then
curr_st <= idle_st;
elsif (rising_edge(CLK)) then
curr_st <= next_st;
end if;
end process CurStDecode_RX;
--------------------------------------------------------------------------------
last_value:process (CLK, RESETn, APPEND_CRC)
begin
if (RESETn = '0') then
last_append <= '1';
elsif (rising_edge(CLK)) then
last_append <= APPEND_CRC;
end if;
end process last_value;
--------------------------------------------------------------------------------
-- Using the current state of the counter and the input signals
-- decide what the next state should be
--------------------------------------------------------------------------------
NxStDecode_Tx:process (curr_st, APPEND_CRC, last_append)
begin
-- FSM
case curr_st is
when idle_st=>
if APPEND_CRC = '1' and last_append = '0' then
next_st <= delay_crc;
else
next_st <= idle_st;
end if;
when delay_crc =>
next_st <= result;
when result =>
next_st <= idle_st;
when others =>
next_st <= idle_st;
end case;
end process NxStDecode_Tx;
--------------------------------------------------------------------------------
-- Using the current state of the counter
-- decide what the output should be
--------------------------------------------------------------------------------
OuStDecode_Tx:process (curr_st)
begin
case (curr_st) is
when idle_st =>
appended_crc <= '0';
when delay_crc =>
appended_crc <= '0';
when result =>
appended_crc <= '1';
when others =>
appended_crc <= '0';
end case;
end process OuStDecode_Tx;
--------------------------------------------------------------------------------
-- output appended frame
--------------------------------------------------------------------------------
process (RESETn, CLK, appended_crc, frame_to_append, crc_result_tx)
begin
if (RESETn = '0') then
frame_to_output <= (others => '0');
elsif (rising_edge(CLK)) then
if (appended_crc = '1') then
frame_to_output(39 downto 0) <= frame_to_append(39 downto 0);
frame_to_output(47 downto 40) <= crc_result_tx;
end if;
end if;
end process ;
--------------------------------------------------------------------------------
-- output signals
--------------------------------------------------------------------------------
TX_FRAME <= frame_to_output;
frame_to_append <= TX_FRAME_NO_CRC;
CRC_APPENDED <= appended_crc;
--------------------------------------------------------------------------------
end arch;
When I try to test them, back to back, my speed drops significantly (bellow 150 MHz).
...
And just to be more precise, the limiting factor when they are both tested is, frame-to-output(Tx) to curr_state(Rx).
From this I imply that what you mean by "back-to-back" testing you mean you connect the Tx Output to the Rx Input. As Russell says, you need to review the CRC_APPENDED
and TX_FRAME
paths between Tx and Rx:
- TX_FRAME
(the Tx Output) is registered on output in the Tx Block. I will assume it goes straight to the CRC Decoder in the Rx. This path could not be re-pipelined any further.
- CRC_APPENDED
comes straight out of the OuStDecode_Tx Mux i.e. combinational logic. Try generating CRC_APPENDED
from a synchronous process:
p_crc_appended : process (CLK, RESETn)
begin
if(RESETn = 0) then
CRC_APPENDED <= '0';
elsif (rising_edge(CLK)) then
CRC_APPENDED <= appended_crc;
end if;
end process;
plus, CRC_APPENDED
and TX_FRAME
will change on the same clock edge. Currently CRC_APPENDED
changes one clock cycle before TX_FRAME
.
tl;dr try registering appended_crc
to generate CRC_APPENDED