arrays types parallel-processing vhdl addition

VHDL - Simultaneous addition of large 2D array. What is the syntax for this

I have reached a position in my design in which we need to massively increase parallelisation, but we have many resources to spare in the FPGA.

To that end, I have the type defined as

type LargeByteArray is array(0 to 10000) of std_logic_vector(7 downto 0);

I have two of these that I want to "byte-wise" average in as few operations as possible, as well as shift right to divide by two. So for example, avg(0) should be an 8bit standard logic vector which is a_in(0) + b_in(0) / 2. avg(1) should be a_in(1) + b_in(1) / 2 and so on. Assume for the moment we don't care that two 8 bit numbers add to a 9 bit. And I want to be able to do the entire 10000 operations in parallel.

I think I need to use an intermediate step to be able to bitshift like this, using the Signal "inter".

entity Large_adder is
Port ( a_in : LargeByteArray;
       b_in : LargeByteArray;
       avg_out : LargeByteArray);

architecture arch of Large_adder is
    SIGNAL inter : LargeByteArray;
begin

My Current code looks a bit like this;

inter(0) <= std_logic_vector((unsigned(a_in(0)) + unsigned(b_in(0))));
inter(1) <= std_logic_vector((unsigned(a_in(1)) + unsigned(b_in(1))));

10000 lines later...

inter(10000) <= std_logic_vector((unsigned(a_in(10000)) + unsigned(b(10000))));

And a similar story for finally assigning the output with the bit shift

avg_out(0) <= '0' & inter(0)(7 downto 1);
avg_out(1) <= '0' & inter(1)(7 downto 1);

All the way down to 10000.

Surely there is a more space efficient way to specify this.

I have tried

inter <= std_logic_vector((unsigned(a_in) + unsigned(b)));

but I get an error about found '0' matching definitions for <= operator.

Now obviously the number could be decreased from 10000 in case this question looks stupid in what I'm trying to achieve, but in general, how do you write these sort of operations elegantly without a line for every element of my Type?

If I had to guess I would say we can describe to the "<=" operator what to do when met with LargeByteArray types. But I do not know how to do so or where to define this behaviour.

Thanks

Solution

You have two choices. Either a for loop inside a process:

  process (a_in, b_in)
  begin
    for I in 0 to 10000 loop
      inter(I) <= std_logic_vector((unsigned(a_in(I)) + unsigned(b_in(I))));
    end loop;
  end process;

  process (inter)
  begin
    for I in 0 to 10000 loop
      c_out(I) <= '0' & inter(I)(7 downto 1);
    end loop;
  end process;

or a generate loop outside a process:

G1: for I in 0 to 10000 generate
  inter(I) <= std_logic_vector((unsigned(a_in(I)) + unsigned(b_in(I))));
end generate;

G2: for I in 0 to 10000 generate
  c_out(I) <= '0' & inter(I)(7 downto 1);
end generate;

https://www.edaplayground.com/x/3hJV

The simulator executes the lines inside the for loop (inside the process) sequentially because simulators always execute lines inside a process sequentially (but concurrently will other processes and concurrent assignments). The simulator executes the lines inside the generate loop concurrently, because a generate loop is a language construct that is used to generate multiple concurrent things. Because of the topology of your circuit (everything is parallel), both methods will behave the same in simulation and in synthesis.