Search code examples
conv-neural-networkverilogsystem-verilogfpgamax-pooling

Determine if a module in SystemVerilog is synthesizable


I am implementing a max-pooling module on FPGA using SystemVerilog. The length of each word is 64 bits, a grid of 28 by 28 words is input data (which is an image 28x28 pixels). The filter size is 2 by 2 words and the stride step is 2.

The code runs well on simulation, the synthesizer in Vivado said nothing about any non-synthesizable part. I also thought this code should well synthesized because all the parameters are literals, so everything should be resolved in compilation step. But when I ask this question on chatGPT, it said this code isn't synthesizable.

This is my code:

`timescale 1ns / 1ps

module max_pooling_module 
#(
    parameter N = 64,
    parameter WIDTH = 28,
    parameter FILTER_SIZE = 2,
    parameter STRIDE = 2
)
(
    input logic signed [WIDTH-1:0][WIDTH-1:0][N-1:0] inputImage,
    output logic signed [(WIDTH-FILTER_SIZE)/STRIDE:0][(WIDTH-FILTER_SIZE)/STRIDE:0][N-1:0] pooledImage
);

    // generate hardwares to do pooling concurrently
    generate 
        genvar row, col;
        
        for(row = WIDTH-1; row + 1 >= STRIDE; row -= STRIDE) begin : stride_down
            for(col = WIDTH-1; col + 1 >= STRIDE; col -= STRIDE) begin : stride_right
                // create an array contains all the elements that will be passed to max module
                logic signed [N-1:0] arr [FILTER_SIZE*FILTER_SIZE];
                
                // add elements to arr
                always_comb begin
                    for(int i = 0; i < FILTER_SIZE; i++) begin
                        for(int j = 0; j < FILTER_SIZE; j++) begin
                            arr[i*FILTER_SIZE+j] = inputImage[row-i][col-j];
                        end
                    end
                end
                
                // max module takes 2 signals, the input is an array of values to compare, 
                // and an output which receives max value
                max pooling(.arr(arr), .maxValue(pooledImage[(row+1)/STRIDE-1][(col+1)/STRIDE-1]));
            end
        end
    endgenerate 

endmodule

This is max module:

module max #(parameter FILTER_SIZE = 2, parameter N = 64) (
    input logic signed [N-1:0] arr[FILTER_SIZE*FILTER_SIZE],
    output logic signed [N-1:0] maxValue 
    );
    
    logic signed [N-1:0] res;
    always_comb begin
        res = arr[0];
        for (int i = 0; i < FILTER_SIZE*FILTER_SIZE; i++) begin
            if (res < arr[i]) res = arr[i];
            else res = res;
        end
    end
    
    assign maxValue = res;
    
endmodule

Is this code synthesizable? If not, what is the correct way of doing it?

This is the message I have after running synthesis and implementation:

enter image description here

The utilization of I/O and LUT are 31360% and 106% respectively.

There is a synthesis status "complete" indication.


Solution

  • I think its synthesizable because synthesis indicated that it finished (synthesis status "complete") and the tool inferred what was modeled in the RTL. The posted code models a bunch of compares, and (based on your comments) synthesis built and connected them using luts. The tool finished with no latches which is good.

    Other indications which would be positive WRT synthesis quality of results

    • With the synthesized design open, f4 in Vivado should produce a schematic, which should be recognizable as your design. In your case you should see many instances of the max module.

    • A post synthesis netlist was generated. Vivado stores the post synthesis netlist into a file container .dcp. Its a lot like a zip file. It can be viewed or extracted using 7zip or similar. It should contain a .edn or .edf file. In project mode (GUI) Vivado puts this file where it wants to so you might need to used find or similar to find it.

    A hard negative indication would be that the synthesis hangs for hours rather than completing the task.

    The biggest issue I see with the posted code is utilization, which is shown in the UTLZ-1 error in implementation, which means the design is bigger that the part targeted WRT luts. You have 56 pounds of stuff in a 53 pound box. Maybe reduce the word size from 64 to 32 bits. Alternatively, you may be able to use less max modules if you figure out a way to use them as an engine and multiplex the data in and out, storing in registers. If you need to do N operations you don't necessarily need N operating engines, just time share the single operation and store results as needed.

    The design's IO utilization is >> that the number of IO available in the physical part targeted. You will need to re-architect this to fit within the available IO. This will probably involve the use of a RAM to store samples. Maybe this could be the start of another question about how to make it fit.

    You won't be able too deploy a .bit file to hardware until the design fits in the part.

    Re the division operator:
    The / operator is used as part of the generate loop which unrolled at elaboration, you are not asking Verilog RTL for divide; should be ok the way / is used. The divide you get is integer division.