I am currently developing an AES encryption core for a Pynq-Z1 FPGA board. I would like to see the routing of the logic in FPGA logic and timing summary of the design.
The project synthesises, but it results in a warning saying that I am using exceeding the number of IOB blocks on the package. This is understandable because the core takes in and outputs a 4 x 4 matrix.
Instead, I would like to have "internal I/O" in order to see the routing on FPGA fabric. How would I go about doing this? Currently, the device view shows an empty topology (shown below) but my synthesised design utilises 4148 LUT and 389 FF. I expect to see some CLBs highlighted.
I appreciate any feedback and reference to any application notes which might further progress my FPGA understanding.
Cheers
You can use a simple wrapper around your core with a serial interface. Something like:
entity wrapper is
port(clk, rst, dsi, dsi_core, shift_out: in std_ulogic;
di: in std_ulogic_vector(7 downto 0);
dso_core: out std_ulogic;
do: out std_ulogic_vector(7 downto 0)
);
end entity wrapper;
architecture rtl of wrapper is
signal di_core, do_core, do_buffer: std_ulogic_vector(127 downto 0);
begin
u0: entity work.core(rtl)
port map(clk, rst, dsi_core, di_core, dso_core, do_core);
input_process: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
di_core <= (others => '0');
elsif dsi = '1' then
di_core <= di & di_core(127 downto 8);
end if;
end if;
end process input_process;
output_process: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
do_buffer <= (others => '0');
elsif dso_core = '1' then
do_buffer <= do_core;
elsif shift_out = '1' then
do_buffer <= do_buffer(119 downto 0) & X"00";
end if;
end if;
end process output_process;
do <= do_buffer(127 downto 120);
end architecture rtl;
The wrapper just receives inputs, one byte at a time (when dsi = '1'
) and shifts them in a 128-bits register that is connected to the 128-bits input of your core. When 16 bytes have been entered the environment asserts dsi_core
to instruct the core that the 128-bits input can be sampled and processed. The environment waits until the core asserts dso_core
, signalling that the processing is over and the 128-bits output is available on the do_core
output port of core. When dso_core
is asserted the wrapper samples do_core
in a 128-bits register (do_buffer
). The environment can now read the leftmost byte of do_buffer
which drives the do
output port of the wrapper. The environment asserts shift_out
to shift do_buffer
one byte to the left and read the next byte...
This kind of wrapper is a very common practice when you want to test in the real hardware a sub-component of a larger system. As it frequently happens that the number of IOs of sub-components exceeds the number of available IOs, serial input-output solves this. Of course there is a significant latency overhead due to the IO operations but it is just for testing, isn't it?