VHDL Logical Simulation Error on add and shift Multiplier

I am trying to do an "add and shift multiplier (sequential)" and I am having problems on the final simulation, the value on the output goes always wrong. I've used a state machine logic to make the control block of the partial sums.

When I make 1 x 1 the output goes wrong (for all products goes wrong) :

Output Error on VWF File

This error appears for all multiplicand and multiplier inputs.

I am using the following code to make the sums:

 library IEEE;
 use IEEE.std_logic_1164.all;

 entity adder_8bits is 
 port (
     cin: in STD_LOGIC;
     a,b: in STD_LOGIC_VECTOR(7 DOWNTO 0);
     s: out STD_LOGIC_VECTOR(8 DOWNTO 0)
 );
 end adder_8bits;

 architecture arch_1 of adder_8bits is 
 begin 
     process(a,b,cin)
     variable soma:std_logic_vector(8 downto 0);
     variable c:std_logic; 
     begin
          c := cin;
          for i in 0 to 7 loop
                soma(i) := a(i) xor b(i) xor c;
                c := (a(i) and b(i)) or ((a(i) xor b(i)) and c);
          end loop;
          s(7 downto 0) <= soma(7 downto 0);
          s(8) <= c;
      end process;
end arch_1;

A 8 bit adder to sum the partial results.

 library IEEE;
 use IEEE.std_logic_1164.all;
 use IEEE.numeric_std.all;

 entity sum_register is 
 port (
     i_DIN   : in UNSIGNED(8 DOWNTO 0);
     i_LOAD  : in STD_LOGIC;
     i_CLEAR : in STD_LOGIC;
     i_SHIFT : in STD_LOGIC;
     i_CLK : in STD_ULOGIC;
     o_DOUT  : buffer UNSIGNED(15 downto 0)
 );
 end sum_register;


 architecture arch_1 of sum_register is 
 begin
     process(i_CLK)
     begin
     IF rising_edge(i_CLK) THEN
        IF (i_CLEAR = '1') THEN
            o_DOUT <= "0000000000000000";
        ELSIF (i_LOAD = '1') THEN
            o_DOUT(15 downto 7) <= i_DIN;
        ELSIF (i_SHIFT = '1') THEN
            IF (i_DIN(8) = '1') THEN
              o_DOUT <= o_DOUT SRL 1;
            END IF;
        END IF;
      END IF;
      end process;
end arch_1;

A sum register to get the actual sum value and shift before the other sum.

LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.std_logic_unsigned.ALL;
use IEEE.std_logic_arith.ALL;

ENTITY controller IS
  PORT (
        i_CLK     : IN STD_ULOGIC;
        i_START   : IN  STD_LOGIC; 
        i_MLTPLR  : IN STD_LOGIC_VECTOR(7 downto 0);
        o_MDLD    : OUT STD_LOGIC; 
        o_MRLD    : OUT STD_LOGIC;  
        o_RSLD    : OUT STD_LOGIC;
        o_RSCLR   : OUT STD_LOGIC;
        o_RSSHR   : OUT STD_LOGIC
      );     
END controller;

ARCHITECTURE arch_1 OF controller IS
  TYPE state_type IS (s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, s15, s16, s17, s18);
  SIGNAL stateT : state_type;
BEGIN
  PROCESS(i_CLK)
  BEGIN
  IF rising_edge(i_CLK) THEN
      IF (i_START = '0') THEN
        stateT <= s0;
      ELSE
        CASE stateT IS
          when s0 => if (i_START = '1') then 
                         stateT <= s1; 
                     end if;
          when s1 =>  stateT <= s2;          
          when s2 => if (i_MLTPLR(0) = '1') then
                         stateT <= s3;
                     else
                         stateT <= s4;
                     end if;
          when s3 => stateT <= s4;                    
          when s4 => if (i_MLTPLR(1) = '1') then
                         stateT <= s5;
                     else
                         stateT <= s6;
                     end if;
          when s5 => stateT <= s6;
          when s6 => if (i_MLTPLR(2) = '1') then
                         stateT <= s7;
                     else
                         stateT <= s8;
                     end if;
          when s7 => stateT <= s8;
          when s8 => if (i_MLTPLR(3) = '1') then
                         stateT <= s9;
                     else
                         stateT <= s10;
                     end if;
          when s9 => stateT <= s10;
          when s10 => if (i_MLTPLR(4) = '1') then
                         stateT <= s11;
                     else
                         stateT <= s12;
                     end if;
          when s11 => stateT <= s12;
          when s12 => if (i_MLTPLR(5) = '1') then
                         stateT <= s13;
                     else
                         stateT <= s14;
                     end if;  
          when s13 => stateT <= s14; 
          when s14 => if (i_MLTPLR(6) = '1') then
                         stateT <= s15;
                     else
                         stateT <= s16;
                     end if;  
          when s15 => stateT <= s16; 
          when s16 => if (i_MLTPLR(7) = '1') then
                         stateT <= s17;
                     else
                         stateT <= s18;
                     end if;           
          when s17 => stateT <= s18; 
          when s18 => stateT <= s0;    
        END CASE;
      END IF;
    END IF;
  END PROCESS;

  o_MDLD <= '1' when (stateT = s1) else '0';  
  o_MRLD <= '1' when (stateT = s1) else '0';  
  o_RSCLR <= '1' when (stateT = s1) else '0';
  o_RSLD  <= '1' when (stateT = s3 or stateT = s5 or 
                       stateT = s7 or stateT = s9 or 
                       stateT = s11 or stateT = s13 or 
                       stateT = s15 or stateT = s17) else '0';    
  o_RSSHR <= '1' when (stateT = s4 or stateT = s6 or 
                       stateT = s8 or stateT = s10 or 
                       stateT = s12 or stateT = s14 or 
                       stateT = s16 or stateT = s18) else '0'; 

END arch_1;

A state machine controller to control the inputs signal from de sum register.

I am using a BDF file to connect all the blocks, the only difference from the schematic below is that in the adder block has a carry in input. The clock of all blocks are in the same pin.

Controller simulation

Anyone have any idea what is causing this error?

Solution

When implementing your answer:

architecture arch_1 of sum_register is 
 begin
     process(i_CLK)
     begin
     IF rising_edge(i_CLK) THEN
        IF (i_CLEAR = '1') THEN
            o_DOUT <= "0000000000000000";
        ELSIF (i_LOAD = '1') THEN
            o_DOUT(15 downto 8) <= i_DIN;
        ELSIF (i_SHIFT = '1') THEN
              o_DOUT <= o_DOUT SRL 1;

        END IF;
      END IF;
      end process;
end arch_1;

what happens when you multiply 255 x 255?

Your product is 1 (which would be correct if this were a signed multiply, you specified unsigned multiplier and multiplicand, where the correct answer would be 65025 ("1111111000000001"). Because you have separate load and shift operations you need to save that discarded carry and use it on a shift in. And because you can have successive multiplier bits that are '0' you need to clear that carry after it's used in a shift instruction (defaulting to the expected sign, which is '0' for unsigned multiplies).

You can do that with your original 9 bit path for the adder_8bits sum and saving the carry:

architecture foo of sum_register is
    signal carry: std_logic;
begin
    process (i_clk)
    begin
        if rising_edge(i_clk) then
            if i_clear = '1' then
                o_dout <= (others => '0');
                carry <= '0';
            elsif i_load = '1' then
                o_dout(15 downto 8) <= i_din (7 downto 0);
                carry <= i_din(8);
            elsif i_shift = '1' then
                o_dout <= carry & o_dout(15 downto 1);
                carry <= '0';  -- expected sign for multiply result
            end if;
        end if;
    end process;
end architecture;

Note that it's cleared when consumed, requiring a preceding load to interject carry = '1'.

And this contrivance could go away if you have two a load-and-shift load and a load instruction instead of a load instruction and shift instruction. It would require switching from a Moore state machine to a Mealy state machine and reduce the number of states.

You're controller, a Moore state machine can traverse 16 states both shifting and loading for a multiplier of "11111111", a Mealy machine could do that in 8 states with the shift-and-load and shift operations in sum_register.

And the sum_register would look something like:

architecture fum of sum_register is

begin
    process (i_clk)
    begin
        if rising_edge(i_clk) then
            if i_clear = '1' then
                o_dout <= (others => '0');
            elsif i_load_shift = '1' then
                o_dout(15 downto 7) <= i_din & o_dout (6 downto 1); 
            elsif i_shift = '1' then
                o_dout <= '0' & o_dout(15 downto 1); -- '0' expected result sign
            end if;
        end if;
    end process;
end architecture;

for a 9 bit sum from adder_8bits. Note the i_load signal is renamed to i_load_shift and controller state machine would need to be re-written as a Mealy machine issuing either i_load_shift = '1' or i_shift = '1' and the other '0' depending on whether the evaluated multiplier bit is a '1' or a '0'.

Note there are plenty of hints here how to signed multiplies even though you declared the multiplier, multiplicand and product as unsigned.