verilog system-verilog timing quartus intel-fpga

How to correctly calculate the frequency of the device in Timing Analyzer, Intel Quartus

I have 3 modules: modulo remainder generator, modulo adder and modulo Wallace adder. Their speeds are related as follows: remainder_modulo > wallace_adder_modulo > modulo_adder. But Timing Analyzer as far as I understand gives me the frequency of the device, but that's not what I need. I want to know the real time delay so that the speeds correlate the way they should. What are the specifications I need to rely on?

module remainder_modulo
#(parameter n)
(
    input wire [n-1:0] A, 
    input wire [n-1:0] P, 
    output wire [n:0] S,  
    output Po           
);
    wire [n:0] A_factor = {A, 1'b0};
    wire [n:0] P_extended = {1'b0, P};
    wire [n:0] S_temp;
    multidigitAdder #(.n(n+1)) multAdd(.A(A_factor), .B(P_extended), .Pi(1'b1), .S(S_temp), .Po(Po));
    assign S = Po ? S_temp : A_factor; 
endmodule

module adder_modulo
#(parameter n)
(
    input wire [n-1:0] A,
    input wire [n-1:0] B,
    input wire [n-1:0] P,
    output wire [n-1:0] S,
    output Po               
);
    wire [n-1:0] S_temp, S_temp_mod;
    multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(1'b0), .S(S_temp));
    multidigitAdder #(.n(n)) multAdd2(.A(S_temp), .B(P), .Pi(1'b1), .S(S_temp_mod), .Po(Po));
    assign S = Po ? S_temp_mod : S_temp;
endmodule

module adder_wallace
#(parameter n)
(
    input wire [n-1:0] A,  
    input wire [n-1:0] B, 
    input wire [n-1:0] P,  
    input Pi,           
    output wire [n-1:0] S, 
    output Po               
);
    wire [n-1:0] S_arr, Po_arr;
    genvar i;
    generate
        for (i = 0; i < n; i = i + 1) begin : MEM
            bitAdder adder(A[i], B[i], P[i], S_arr[i], Po_arr[i]);
        end
    endgenerate

    wire [n:0] multi_B_arr = {Po_arr, Pi};
    wire [n:0] multi_A_arr = {1'b0, S_arr};
    multidigitAdder #(.n(n + 1)) mAdder(.A(multi_A_arr), .B(multi_B_arr), .Pi(1'b0), .S(S), .Po(Po));
endmodule

module adder_modulo_wallace
#(parameter n)
(
    input wire [n-1:0] A,
    input wire [n-1:0] B,
    input wire [n-1:0] P,
    output wire [n-1:0] S,
    output Po           
);
    wire [n-1:0] simpleSum, wallaceSum;
    multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(0), .S(simpleSum));
    adder_wallace #(.n(n)) add(.A(A), .B(B), .P(P), .Pi(1), .S(wallaceSum), .Po(Po));
    assign S = Po ? wallaceSum : simpleSum;
endmodule

module multidigitAdder
#(parameter n)
(
    input wire [n-1:0] A,
    input wire [n-1:0] B,
    input Pi,
    output wire [n-1:0] S,
    output Po
);
    assign {Po, S} = A + B + Pi;
endmodule

remainder_modulo:

Maximum frequency - 165.65 Mhz
Start node: cnt[0]
End node: reduce_modulo:reduce|multidigitAdder:multAdd|Add1~8_OTERM9
Slack: 16.642
Data delay: 3.31

wallace_adder_modulo:

Maximum frequency: 136.59 Mhz
Start node: cnt[0]
End node: adder_modulo_wallace:addWallaceMod|S[3]~3_OTERM9
Slack: 17.084
Data delay: 2.75

adder_modulo:

Maximum frequency: 165.65 Mhz
Start node: cnt[0]
End node: adder_modulo:addMod|multidigitAdder:multAdd2|Add1~6_OTERM9
Slack: 18.076
Data delay: 1.875

Solution

The parameter Maximum Frequency is limiting factor on performance.
The posted code will implement as combinational logic whose max delay is 1/Maximum Frequency for the given module.

If the modules are implemented as part of a single clock synchronous system, then the max clock rate of the system will be is controlled by the slowest module which is the wallace_adder_module at 136.59 MHz.
The delay to obtain a new sample from any module in that system is 1/136.59 MHz = 7.3212 ns.

Consider an assembly line of workers consisting of multiple workstations; the performance limiting factor of the line is the slowest station.

There is no expected, actual, or average delay reported by fpga timing tools. There is no theoretical delay. The tools report the maximum so that designers can select a maximum clock frequency. If the delay thru the logic is > than the clock frequency, the design does not work. The assumption in synchronous design is that the logic produce 1 logical result per clock cycle.
Here is the options menu for reporting delays in Vivado's timing analyzer. Other vendors will be similar.

A theoretical delay could be manually postulated based on mapping to theoretical gate delays, however fpga's don't target gates (they target the vendors macro blocks) so those models don't exist in the scope of fpga tools.

Since Vivado provides min delays, you could take min + max/2 as a typical; however I would not rely on that number in any way other than as a thought experiment.

It looks like you synthesized the modules separately without any top level module to bring them together. The hardware implementation & performance numbers will change significantly when they are synthesized together because of combining. Same will happen when combined into a synchronous system with registers/flip flops.

If you want to understand the nature of the delays better, open the tools RTL view and take a close look at how the logic got mapped to the vendors hardware.

There is no need to attempt to align delays between modules for fpga design. Put modules in a synchronous system so that each module is surrounded by registers/ff's and the system acts as if each modules produces a new answer every clock edge.