I'm implementing a hashing algorithm in Verilog using Vivado 2019.2.1. Everything (including synthesis and implementation) worked quite well but I noticed recently that the results of the behavioral simulation (correct hash digest) differs from the post-synthesis/-implementation functional and timing simulation, i.e. I receive three different values for the same circuit design/code.
My base configuration contained a testbench using the default `timescale 1ns / 1ps and a #1 delay for toggling the clock register. I further constrained the clock to a frequency of 10 MHz using an xdc file. During synthesis, no errors (or even warnings, except some "parameter XYZ is used before its declaration") are shown and no non-blocking and blocking assignments are mixed inside my code. Nevertheless, I noticed that the post-* simulation (no matter if functional or timing) needs more clock cycles (e.g. 58 instead of 50 until the value of a specific register was toggled) for achieving the same state of the circuit. My design is entirely synchronous and driven by one clock.
This brought me to the Timing Report and I noticed that 10 input and 10 output delays are not constrained. In addition, the Design Timing Summary shows a worst negative slack
for setup that is very close to the time of one clock cycle. I tried some combinations of input and output delays following the Vivado documentation and tutorial videos but I'm not sure how to find out which values are suitable. The total slack (TNS, THS and TPWS) is zero.
Furthermore, I tried to reduce the clock frequency because the propagation delay of some signals that control logic in the FSM (= top) module might be too large. The strange thing that happened then is that the simulation never reached the $finish; in my testbench and nothing except the clock register changed its value in the waveform. In the behavioral simulation everything works as expected but this doesn't seem to be influenced by constraints or even timing. Monitoring the o_round_done wire (determined by an LFSR in a separate submodule) in my testbench, I noticed that for the behavioral simulation the value of this wire changes with the clock whereas for the post-* simulations the value is changed with a small delay:
Behavioral Simulation
clock cycles: 481, round_done: 0 clock cycles: 482, round_done: 1 clock cycles: 483, round_done: 0
total of 1866 clock cycles
Post-Implementation Functional Simulation
clock cycles: 482, round_done: 0 clock cycles: 482, round_done: 1 clock cycles: 483, round_done: 1 clock cycles: 483, round_done: 0
total of 1914 clock cycles
Post-Implementation Timing Simulation
WARNING: "C:\Xilinx\Vivado\2019.2\data/verilog/src/unisims/BUFG.v" Line 57: Timing violation in scope /tb/fsm/i_clk_IBUF_BUFG_inst/TChk57_10300 at time 997845 ps $period (posedge I,(0:0:0),notifier) WARNING: "C:\Xilinx\Vivado\2019.2\data/verilog/src/unisims/BUFG.v" Line 56: Timing violation in scope /tb/fsm/i_clk_IBUF_BUFG_inst/TChk56_10299 at time 998845 ps $period (negedge I,(0:0:0),notifier)
simulation never stops (probably because round_done is never 1)
Do you know what I'm doing wrong here? I'm wondering why the circuit is not behaving correctly at very low clock frequencies (e.g. 500 kHz) as, to my knowledge, this will provide enough time for each signal to "travel" to the correct destination.
Another thing I noticed is that one wire that is assigned to a register in a submodule is 8'bXX in the behavioral simulation until the connected register is "filled" but in the post-* simulations it is 8'b00 from the beginning. Any idea here?
Moreover, what is actually defining the clock frequency for the simulations? The values in the testbench (timescale and delay #) or the constraint in the xdc file?
I found an explanation for the question why the post-* simulations are behaving differently compared to the behavioral simulation w.r.t. clock cycles etc. in the Xilinx Vivado Design Suite User Guide for Logic Simulation (UG900).
What causes the "latency" before the actual computation of the design can start is called Global Set and Reset (GSR)
and takes 100ns:
The glbl.vfile declares the global GSR and GTS signals and automatically pulses GSR for 100ns. (p. 217)
Consequently, I solved the issue by letting the test bench wait for the control logic (= finite-state machine) to be ready, i.e. changing to the state after RESET.