Search code examples
vhdltiming

Simplifying A State Machine To Reduce Logic Levels and Meet Timing


My design at the moment isn't meeting timing. I've tried putting it on a slower clock and pipelining the inputs/outputs. The problem is always the same - too many levels of logic. Have any of you got any tips on making this logic more clock friendly?

signal ctr : std_logic_vector(9 downto 0);
signal sig_bit_shift : std_logic_vector (15 downto 0);

begin

process(clk_p)
begin
    if rising_edge(clk_p) then
        if rst_i = '1' or nuke = '1' then
            ctr <= (others => '0'); 
            state <= ST_IDLE;
        elsif unsigned(event_settings) < 1 then -- disables
            state <= ST_IDLE;
        elsif unsigned(event_settings) = 1 then -- always on
            state <= ST_ENABLE;    
        else
            case state is
            when ST_IDLE =>
                if ctr = (unsigned(event)-2) then                     
                    state <= ST_ENABLE;
                elsif unsigned(ctr) = 1 and sig = '0' then --catches first word
                    state <= ST_ENABLE;                       
                elsif sig = '1' then
                    ctr <= ctr + 1;
                end if;
            when ST_ENABLE =>
                if s_sig = '1' then
                    state <= ST_IDLE;
                    if unsigned(s_evt) > 1 then
                        ctr <= (others => '0');
                    end if;
                end if;
            end case;
        end if;
    end if; 
end process;

UPDATE:

process(clk_p)
begin
    if rising_edge(clk_p) then
        if rst_i = '1' or nuke = '1' then
            ctr <= x"00" & "10"; 
            state <= ST_IDLE;
        elsif settings = '1' then
            case state is
            when ST_IDLE =>
                if ctr = (unsigned(event)) then                     
                    state <= ST_ENABLE;
                elsif unsigned(ctr) = 1 and sig = '0' then --catches first word -- this is the part which when added, fails timing
                    state <= ST_ENABLE;                       
                elsif sig = '1' then
                    ctr <= ctr + 1;
                end if;
            when ST_ENABLE =>
                if s_sig = '1' then
                    state <= ST_IDLE;
                    if unsigned(s_evt) > 1 then
                        ctr <= X"00" & "10";
                    end if;
                end if;
            end case;
        end if;
    end if; 
end process;

I think too it's slowed down by where the signal comes from:

sig <= sig_token when unsigned(SIG_DELAY) < 1 else (sig_bit_shift(to_integer(unsigned(SIG_DELAY)-1)));

process(clk_p) -- delays sig
begin
    if rising_edge(clk_p) then
        if rst = '1' then
            sig_bit_shift <= (others => '0');
        else
            sig_bit_shift <= l1a_bit_shift(sig_bit_shift'high-1 downto 0) & sig_token;
        end if;
    end if;
end process;

UPDATE 2 :

Seems like half the routing went into the above delay so i'm going to try and fix with this:

signal sig_del_en : std_logic;
signal sig_del_sel : integer; 

begin
process(clk_p)
begin
    if rising_edge(clk_p) then
        if unsigned(SIG_DELAY) = 0 then
            sig_del_en <= '0';
        else
            sig_del_en <= '1';
        end if;
        sig_del_sel <= to_integer(unsigned(SIG_DELAY)-1);
    end if;
end process;

   sig <= sig_token when sig_del_en = '0' else (sig_bit_shift(sig_del_sel));

Solution

  • Some of the "slow" operations are array = which requires compare over all bits in the argument, and < and > which requires subtraction over all bits in the argument. So you may improve timing in a cycle, if there is sufficient time in the previous cycle to generate the compare result up front as a std_logic. It may be relevant for these:

    • unsigned(event_settings) < 1
    • unsigned(event_settings) = 1
    • ctr = (unsigned(event)-2)
    • unsigned(ctr) = 1
    • unsigned(s_evt) > 1

    The code to generate the different std_logic values depends on the way the related signal is generated, but an example can be:

    process (clk) is
      variable event_settings_v : event_settings'range;
    begin
      if rising_edge(clk) then
        ...
        event_settings_v := ... code for generating event_settings;  -- Variable with value
        event_settings <= event_settings_v;  -- Signal drive from variable
        if unsigned(event_settings_v) < 1 then
          unsigned_event_settings_tl_1 <= '1';
        else
          unsigned_event_settings_tl_1 <= '0';
        end if;
      end if;
    end process;
    

    The code unsigned(event_settings) < 1 in the state machine can then be changed to unsigned_event_settings_tl_1 = '1', which may improve timing if this compare is in the critical path.

    Using the asynchronous reset typically available on the the flip-flop for rst_i = '1' may also improve timing, since it removes logic from the synchronous part. It is unlikely to give a significant improvement, but it's typically a good design practice in order to maximize the time for synchronous logic. The asynchronous reset is used through coding style like:

    process (rst_i, clk_p) is
    begin
      if rst_i = '1' then
        ... Apply asynchronous reset value to signals
      elsif rising_edge(clk_p) then
        ... Synchronous update of signals