What's wrong with this simple VHDL for loop?

For some reason the OutputTmp variable will always be uninitialized in the simulation. I can make it work without a for loop but I really want to automate it so I can later move on to bigger vectors. The intermediate variable works fine.

Note: I'm a DBA and C# programmer, really new to VHDL, sorry if this is a stupid question.

architecture Arch of VectorMultiplier4 is

signal Intermediate : std_logic_vector(0 to 4);
signal OutputTmp : std_logic;

begin

process (Intermediate)
begin

  for i in 0 to 4 loop
    Intermediate(i) <= (VectorA(i) AND VectorB_Reduced(i));    
  end loop;

  --THIS IS WHAT DOES NOT WORK APPARENTLY
  OutputTmp <= '0';
  for i in 0 to 4 loop
    OutputTmp <= OutputTmp XOR Intermediate(i);
  end loop;
Output <= OutputTmp;
end process;

end architecture;

Thanks!

Solution

This is slightly different from the answer fru1tbat points to.

One characteristic of a signal assignment is that it is scheduled for the current or a future simulation time. No signal assignment actually takes effect while any simulation process is pending (and all signal involved statements are devolved into either block statements preserving hierarchy and processes or just processes).

You can't rely on the signal value you have just assigned (scheduled for update) during the same simulation cycle.

The new signal value isn't available in the current simulation cycle.

A signal assignment without a delay in the waveform (no after Time) will be available in the next simulation cycle, which will be a delta cycle. You can only 'see' the current value of signal.

Because OutputTmp appears to be named as an intermediary value you could declare it as a variable in the process (deleting the signal declaration, or renaming one or the other).

    process (VectorA, VectorB_Reduced)
        variable OutputTmpvar:  std_logic;
        variable Intermediate: std_logic_vector (0 to 4);
    begin

      for i in 0 to 4 loop
        Intermediate(i) := (VectorA(i) AND VectorB_Reduced(i));    
      end loop;

      -- A variable assignment takes effect immediately
      OutputTmpvar := '0';
      for i in 0 to 4 loop
        OutputTmpvar := OutputTmpv XOR Intermediate(i);
      end loop;
    Output := OutputTmpvar;
    end process;

And this will produce an odd parity value of the elements of the Intermediate array elements.

Note that Intermediate has also been made a variable for the same reason and VectorA and VectorB_Reduced have been placed in the sensitivity list instead of Intermediate.

And all of this can be further reduced.

    process (VectorA, VectorB_Reduced)
        variable OutputTmpvar:  std_logic;
    begin

      -- A variable assignment takes effect immediately
      OutputTmpvar := '0';
      for i in 0 to 4 loop
        OutputTmpvar := OutputTmpvar XOR (VectorA(i) AND VectorB_Reduced(i));
      end loop;
    Output <= OutputTmpvar;
    end process;

Deleting Intermediate.

Tailoring for synthesis and size extensibility

And if you need to synthesis the loop:

    process (VectorA, VectorB_Reduced)
        variable OutputTmp: std_logic_vector (VectorA'RANGE) := (others => '0');
    begin

      for i in VectorA'RANGE loop
          if i = VectorA'LEFT then
              OutputTmp(i) := (VectorA(i) AND VectorB_Reduced(i));
          else 
              OutputTmp(i) := OutputTmp(i-1) XOR (VectorA(i) AND VectorB_Reduced(i));
          end if;
      end loop;
    Output <= OutputTmp(VectorA'RIGHT);
    end process;

Where there's an assumption VectorA and VectorB_reduced have the same dimensionality (bounds).

What this does is provide ever node of the synthesis result 'netlist' with a unique name and will generate a chain of four XOR gates fed by five AND gates.

This process also shows how to deal with any size matching bounds input arrays (VectorA and VectorB_Reduced in the example) by using attributes. If you need to deal with the case of the two inputs having different bounds but the same length you can create variable copies of them with the same bounds, something you'd like do as a matter of form if this were implemented in a function.

Flattening the chain of XORs is something handled in the synthesis domain using performance constraints. (For a lot of FPGA architectures the XOR's will fit in one LUT because of XOR's commutative and associative properties).

(The above process has been analyzed, elaborated and simulated in a VHDL model).