Can't get Mealy FSM simulation working after synthesis

I am trying to design a non-overlapping sequence detector according to the following state-machine:

I wrote the following code, in systemverilog:

typedef enum { S0, S1, S2, S3 } State;

module ass26(
    input sysclk, rst, in,
    output out
    );

    State state, stateNext;
    logic dout;

    always @(posedge sysclk) begin
        if (rst)
            state <= S0;
        else
            state <= stateNext;
    end

    // next-state logic
    always_comb begin
        stateNext = state;

        case (state)
            S0 : begin
                if (in == 1'b1)
                    stateNext = S1;
            end

            S1: begin
                if (in == 1'b0)
                    stateNext = S2;
            end

            S2: begin
                if (in == 1'b1)
                    stateNext = S1;
                else
                    stateNext = S3;
            end

            S3: begin
                stateNext = S0;
            end
            default: stateNext = S0;
        endcase
    end

    // output logic
    always_comb begin
        dout = (state == S3) & in;
    end

    assign out = dout;
endmodule

using the "3-process" method (I know this is not what experienced engineers tend to use, but that shouldn't matter).

I also have the following testbench:

`timescale 1ns / 1ps


module ass26_tb();

reg sysclk, rst, in, out;

ass26 dut (.sysclk(sysclk), .rst(rst), .in(in), .out(out));

initial begin
    sysclk = 0;
    forever #10 sysclk = ~sysclk;
end

initial begin
    rst = 1'b1;
    in = 1'b0;
    #15;
    
    rst = 0;
    in = 0;
    #10;
    
    in = 1; #20;
    in = 0; #20;
    in = 1; #20;
    in = 0; #20;
    in = 0; #20;
    in = 1; #20;
    $finish();
end

When I simulate the behavioural model, I get the following, expected, result

with the output reacting asynchronously based on the current state, and the input, just like we would expect from a Mealy machine.

The behavioural schematic produces by Vivado looks like this:

Now to the actual issue. After synthesizing the design, the simulation no longer works, i.e., I get the following post-synt simulation results:

The post-synt schematic:

I can not understand the issue here. I did VHDL 10 years ago at Uni, and haven't touched FPGAs since. Now, I try to learn verilog/systemverilog and go back to basics. I rewrote the code in pure verilog, even found other examples on the Internet, but with the same result. What am I missing here?

Appreciate the help!

Solution

Several things; 1-3 are important, I think 4 is the root cause.

Recommend specifying the state encoding like this:

typedef enum logic [1:0]{ S0=2'b00, S1=2'b01, S2=2'b10, S3=2'b11 } State;

The exact encoded values don't matter for this example, however knowing what they are assists in debug. Watch the synthesis log for messages about the variable State, the tool will sometimes change the encodings.

This should help determine what the SM is doing with the post synth design. If the tool does not change the encodings, then asseting reset places State at 2'b00.
I changed the number of bits from 32 (integer=32 is the default) to 2.
The post synth is 3 bits which probably does not make sense for 4 states.
Recommend holding the reset asserted for a couple of clocks longer to make sure the design is getting reset.
Xilinx has a somewhat hard to find/see asynchronous reset called GSR (global set/reset) built into the hardware which is asserted for 100ns at startup. The GSR is not inferred using RTL coding, its natively part of the FPGA IC chip. Stimulus during the first 100 ns of a timing simulation is effectively ignored by hardware because the flip flops are all held in reset by the GSR.

Read more about the Xilinx GSR and it's use in a timing simulation here: GSR or internet search it.

Here is a snip of the reference:

In post-synthesis and post-implementation simulations, the GSR signal is automatically asserted for the first 100 ns to simulate the reset that occurs after configuration.

The solution is to delay the testbench stimulus at least 100ns, thus waiting until the GSR is de-asserted by hardware.

Here is the testbench procedural process with 120ns delay near the beginning.

    initial begin
        rst = 1'b1;
        in = 1'b0;
        // delay the stimulus until after GSR release
        #120;
        //
        #15;
        
        rst = 0;
        in = 0;
        #10;
        
        in = 1; #20;
        in = 0; #20;
        in = 1; #20;
        in = 0; #20;
        in = 0; #20;
        in = 1; #20;
        $finish();
    end

Follow up: This one is gets outside the scope of learning Verilog/SystemVerilog/VHDL and might be more in the domain of FPGA hardware knowledge. There are a lot of quirky specialized things to know about FPGA hardware.

In my travels, FPGA designers don't do a lot of post synthesis simulations, rather they rely on good coding practices, behavioral simulation, and post-route static timing to verify the design. ASIC design flows might be different because of the NRE cost.