Search code examples
aesvhdlfpga

VHDL core synthesis and implementation in Vivado


I am currently developing an AES encryption core for a Pynq-Z1 FPGA board. I would like to see the routing of the logic in FPGA logic and timing summary of the design.

The project synthesises, but it results in a warning saying that I am using exceeding the number of IOB blocks on the package. This is understandable because the core takes in and outputs a 4 x 4 matrix.

Instead, I would like to have "internal I/O" in order to see the routing on FPGA fabric. How would I go about doing this? Currently, the device view shows an empty topology (shown below) but my synthesised design utilises 4148 LUT and 389 FF. I expect to see some CLBs highlighted.

design device view

I appreciate any feedback and reference to any application notes which might further progress my FPGA understanding.

Cheers


Solution

  • You can use a simple wrapper around your core with a serial interface. Something like:

    entity wrapper is
      port(clk, rst, dsi, dsi_core, shift_out: in std_ulogic;
           di: in std_ulogic_vector(7 downto 0);
           dso_core: out std_ulogic;
           do: out std_ulogic_vector(7 downto 0)
         );
    end entity wrapper;
    
    architecture rtl of wrapper is
    
      signal di_core, do_core, do_buffer: std_ulogic_vector(127 downto 0);
    
    begin
    
      u0: entity work.core(rtl)
        port map(clk, rst, dsi_core, di_core, dso_core, do_core);
    
      input_process: process(clk)
      begin
        if rising_edge(clk) then
          if rst = '1' then
            di_core <= (others => '0');
          elsif dsi = '1' then
            di_core <= di & di_core(127 downto 8);
          end if;
        end if;
      end process input_process;
    
      output_process: process(clk)
      begin
        if rising_edge(clk) then
          if rst = '1' then
            do_buffer <= (others => '0');
          elsif dso_core = '1' then
            do_buffer <= do_core;
          elsif shift_out = '1' then
            do_buffer <= do_buffer(119 downto 0) & X"00";
          end if;
        end if;
      end process output_process;
    
      do <= do_buffer(127 downto 120);
    
    end architecture rtl;
    

    The wrapper just receives inputs, one byte at a time (when dsi = '1') and shifts them in a 128-bits register that is connected to the 128-bits input of your core. When 16 bytes have been entered the environment asserts dsi_core to instruct the core that the 128-bits input can be sampled and processed. The environment waits until the core asserts dso_core, signalling that the processing is over and the 128-bits output is available on the do_core output port of core. When dso_core is asserted the wrapper samples do_core in a 128-bits register (do_buffer). The environment can now read the leftmost byte of do_buffer which drives the do output port of the wrapper. The environment asserts shift_out to shift do_buffer one byte to the left and read the next byte...

    This kind of wrapper is a very common practice when you want to test in the real hardware a sub-component of a larger system. As it frequently happens that the number of IOs of sub-components exceeds the number of available IOs, serial input-output solves this. Of course there is a significant latency overhead due to the IO operations but it is just for testing, isn't it?