Problem with VCS simulation for MAC operation

I intended to write and simulate a module performing MAC operation. My code is shown below

module PE # (
parameter DW = 8
)
(
input clk,
input rst_n,
input  [DW-1 : 0] cin,
input  [DW-1 : 0] w,
output  [DW-1 : 0] cin_out,
output  [DW-1 : 0] w_out,
output [2*DW : 0] pe_out
);

reg [DW-1 : 0] cin_reg;
reg [DW-1 : 0] w_reg;
reg [2*DW : 0] m_reg;
reg [2*DW : 0] pe_reg;


always @(posedge clk) begin
    if(rst_n==1'b0) begin
        pe_reg <= 0;
        cin_reg <= 0;
        w_reg <= 0;
    end
    else begin
        cin_reg <= cin;
        w_reg <= w;
        m_reg <= cin_reg * w_reg;
        pe_reg <= pe_reg + m_reg;
    end
end

assign cin_out = cin_reg;
assign w_out = w_reg;
assign pe_out = pe_reg;

endmodule

I used VCS to simulate, however, pe_out kept xxxxxxxx as shown in fig below. false wave I have asked my friend to use verilator to simulate, it can work as expected. And if I delete pe_reg <= pe_reg + m_reg, it still works. Therefore, the issue is likely to be caused by the add operation? But I haven't solved it yet.

I'll be appreciate if anyone can give me some instructions. It have confused me for hours.

My tb is written as below.

module tb_PE;

reg clk;
reg rst_n;
reg [7:0] cin;
reg [7:0] w;

wire [7:0] cin_out;
wire [7:0] w_out;
wire [16:0] pe_out;
wire [16:0] pe_out_tmp;

initial begin

clk = 0;
forever begin
    #10;
    clk = ~clk;
end

end
initial begin
rst_n = 1'b1;
#5;
rst_n = 1'b0;
#10;
rst_n = 1'b1;
#5;
cin = 8'h01;
w = 8'h02;
#15;
cin = 8'h03;
w = 8'h04;
#20;
cin = 8'h05;
w = 8'h03;
#20;
cin = 8'h90;
w = 8'h88;
#20;
cin = 8'h65;
w = 8'h20;
#100;
$finish;

PE PE_U (
.clk(clk),
.rst_n(rst_n),
.cin(cin),
.w(w),
.cin_out(cin_out),
.w_out(w_out),
.pe_out(pe_out)
);

end

Solution

As @mkrieger1 mentioned, you have not initialized m_reg. So, the following is happening:

at the first posedge you initialized some of the variables, including pe_reg, while m_reg is still x.
at the second posedge m_reg is still x. Using nonblocking assignment you schedule it to change later, but in this expression pe_reg <= pe_reg + m_reg; it is still x.
as a result, pe_reg becomes x again because m_reg is still x and it will stays same because it is used recursively in the expression.

So, the easiest way to handle it is to initialize m_reg in the same bucket as pe_reg. If for some reason it is not acceptable, you need to delay evaluation of pe_reg for another cycle.