Search code examples
arraysalgorithmvhdlfpgaxilinx

More resource efficient way to get the maximum of the last 512 values


I have written some VHDL code that stores the last 512 values of an input signal and calculates the largest of the stored values. This code works but uses a lot of the LUT resources of my FPGA. The purpose of the code is to calculate the largest value of the last 512 samples, is there a more resource efficient way of achieving this? (it's important that it calculates the largest of the last 512 values and not the largest value observed from that input, the latter could easily be achieved storing a single number).

Alternatively is there someway I could code the VHDL such that the synthesiser would implement the array as block RAM (BRAM) instead of LUTs?

The synthesiser I am using is LabVIEW FPGA (which I believe uses XilinX ISE internally to compile/synthesise the VHDL).

My current code for this is shown below:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity RecentMax is
  port (
    clk : in std_logic;
    reset : in std_logic;
    InputSignal : in std_logic_vector(15 downto 0);
    Max : out std_logic_vector(15 downto 0)
    );
end RecentMax;

architecture RTL of RecentMax is
-- declarations
  type Array512 is array(0 to 511) of signed(15 downto 0);
  signal PastVals : Array512;
  type Array256 is array(0 to 255) of signed(15 downto 0);
  signal result : Array256;

  signal CalculationState : unsigned(1 downto 0);
  signal NLeftToCompute : unsigned(8 downto 0);
begin
-- behaviour
  process(clk)
  begin
    if(rising_edge(clk)) then
      if(reset = '1') then
        -- reset values
        for i in PastVals'low to PastVals'high loop
          PastVals(i) <= (others => '0');
        end loop;
        for i in result'low to result'high loop
          result(i) <= (others => '0');
        end loop;
        CalculationState <= to_unsigned(0, 2);
        Max <= std_logic_vector(to_signed(0, 16));
        NLeftToCompute <= to_unsigned(256, 9);
      else
        -- do stuff
        case to_integer(CalculationState) is
          when 0 =>
            for i in PastVals'low to PastVals'high-1 loop
              PastVals(i+1) <= PastVals(i);
            end loop;
            PastVals(0) <= signed(InputSignal);
            Max <= std_logic_vector(result(0));
            NLeftToCompute <= to_unsigned(256, 9);
            CalculationState <= to_unsigned(1, 2);
          when 1 =>
            for i in 0 to 255 loop
              if (i <= to_integer(NLeftToCompute)-1) then
                if PastVals(i*2) > PastVals(i*2+1) then
                  result(i) <= PastVals(i*2);
                else
                  result(i) <= PastVals(i*2+1);
                end if;
              end if;
            end loop;
            NLeftToCompute <= shift_right(NLeftToCompute, 1);
            CalculationState <= to_unsigned(2, 2);
          when 2 =>;
            for i in 0 to 127 loop
              if (i <= to_integer(NLeftToCompute)-1) then
                if result(i*2) > result(i*2+1) then
                  result(i) <= result(i*2);
                else
                  result(i) <= result(i*2+1);
                end if;
              end if;
            end loop;
            if NLeftToCompute > 2 then
              NLeftToCompute <= shift_right(NLeftToCompute, 1);
            else
              CalculationState <= to_unsigned(0, 2);
            end if;
          when others =>
            --- do nothing - shouldn't get here
        end case;
    end if;
  end if;
end process;
end RTL;

Solution

  • For this particular application it was sufficient to just update the maximum every 512 clock cycles. My updated code solution is shown below. I'd still be interested in an answer to this question, as to if there's a more resource efficient method that works in a low number of clock cycles.

    Code Solution:

    library ieee;
    use ieee.std_logic_1164.all;
    use ieee.numeric_std.all;
    
    entity RecentMax is
      port (
        clk : in std_logic;
        reset : in std_logic;
        InputSignal : in std_logic_vector(15 downto 0);
        Max : out std_logic_vector(15 downto 0)
        );
    end RecentMax;
    
    architecture RTL of RecentMax is
    -- declarations
    signal counter : integer;
    signal RecentMax : signed(15 downto 0);
    
    begin
    -- behaviour
      process(clk)
      begin
        if(rising_edge(clk)) then
          if(reset = '1') then
            -- reset values
            counter <= 0;
            RecentMax <= to_signed(0, 16);
          else
          -- do stuff
          if counter = 0 then
            Max <= std_logic_vector(RecentMax);
            counter <= counter + 1;
            RecentMax <= to_signed(0, 16);
          else
            if signed(InputSignal) > RecentMax then
              RecentMax <= signed(InputSignal);
            end if;
            if counter >= 511 then
              counter <= 0;
            else
              counter <= counter + 1;
            end if;
          end if;  
        end if;
      end if;
    end process;
    end RTL;