Search code examples
sortingverilogquartus

How to fix long compilation for Verilog HDL in quartus


I've been trying to create a counting sort algorithm using Verilog HDL, but when I tried to compile this iteration of it, Quartus started to compile it for a really long time. I can't figure out what is the issue.

module sort(reset, clk, data_in0,data_in1,data_in2,data_in3,data_in4,data_in5,data_in6,data_in7,data_in8,data_in9, data_out0, data_out1, data_out2, data_out3, data_out4, data_out5, data_out6, data_out7, data_out8, data_out9);

input wire reset, clk;

input wire [1:0] data_in0;

input wire [1:0] data_in1;

input wire [1:0] data_in2;

input wire [1:0] data_in3;

input wire [1:0] data_in4;

input wire [1:0] data_in5;

input wire [1:0] data_in6;

input wire [1:0] data_in7;

input wire [1:0] data_in8;

input wire [1:0] data_in9;

output reg [1:0] data_out0;

output reg [1:0] data_out1;

output reg [1:0] data_out2;

output reg [1:0] data_out3;

output reg [1:0] data_out4;

output reg [1:0] data_out5;

output reg [1:0] data_out6;

output reg [1:0] data_out7;

output reg [1:0] data_out8;

output reg [1:0] data_out9;

reg [1:0] mem [9:0];


reg[9:0] buff [3:0];
integer i,k,j,f,s;

always@ (posedge clk)

begin

    for(i=0; i<4; i=i+1)
    buff[i]<=0;
    if (reset == 1) begin

    for (i = 0; i < 10; i = i + 1) mem[i]<=0;
    s=0;
    f=0;


end

else begin
if (f==0)begin
mem [0] <= data_in0;
mem[1]<=data_in1;

mem[2]<=data_in2;

mem[3]<=data_in3;

mem[4]<=data_in4;

mem[5]<=data_in5;

mem[6]<=data_in6;

mem[7]<=data_in7;

mem[8]<=data_in8;

mem[9]<=data_in9;
f=1;
end
 for( i = 0; i <10 ; i=i+1)
begin
    buff[mem[i]]<=buff[mem[i]]+1;
end
if(s==0) begin
k<=0;
    for( i = 0; i <4 ; i=i+1)
    begin
        for( j = 0; j < 10 ; j = j +1)
        begin
            if(j<buff[i])
            begin
                mem[k]<=i;
                k<=k+1;
            end
        end
    end

end     s=1;    

data_out0 = mem[0];

data_out1 = mem[1];

data_out2 = mem[2];

data_out3 = mem[3];

data_out4 = mem[4];

data_out5 = mem[5];

data_out6 = mem[6];

data_out7 = mem[7];

data_out8 = mem[8];

data_out9 = mem[9];

end

end

endmodule

It takes a really long time to pass the Analysis and Synthesis section. I assume it is due to mistakes in this code or the wrong use of operators, but I can't understand where it is exactly.


Solution

  • This is an example of how a FSM-based solution for your problem would be. It can be vastly improved, though, but it's just a starting (and hopefully working) point.

    For start, I've changed your module interface. Discrete inputs can be used, but as the algorithm uses indexes to run over the entire input domain, I've assume two external memories: one with the input data and another one which will hold the output data. The module drives the corresponding address bus for both memories, as well as the write enable signal, and data buses. There is also a busy signal so the rest of the system knows that the module has not yet finished sorting data. Finally, I've sorted 16 numbers instead of 10.

    Internally, I've used a memory element, count, as the vector that holds the histogram of the input data. As this memory is tiny, I've used as four independent registers. This allows me to use more than one element of "count" in the same clock cycle, as in count[3] <= count[3] + count[2] + count[1] + count[0];

    The version of the algorithm I've used is from Wikipedia: https://en.wikipedia.org/wiki/Counting_sort

    function countingSort(array, k) is
      count ← new array of k zeros
      for i = 1 to length(array) do
        count[array[i]] ← count[array[i]] + 1
      for i = 2 to k do
        count[i] ← count[i] + count[i - 1]
      for i = length(array) downto 1 do
        output[count[array[i]]] ← array[i]
        count[array[i]] ← count[array[i]] - 1
      return output
    

    And this is the Verilog module:

    module sort (
      input wire clk,
      input wire reset,
      output reg [3:0] addr_data_in,
      input wire [1:0] data_in,
      output reg [3:0] addr_data_out,
      output reg [1:0] data_out,
      output reg write_data_out_strobe,
      output reg busy
    );
    
    /*
    function countingSort(array, k) is
      count ← new array of k zeros
      for i = 1 to length(array) do
        count[array[i]] ← count[array[i]] + 1
      for i = 2 to k do
        count[i] ← count[i] + count[i - 1]
      for i = length(array) downto 1 do
        output[count[array[i]]] ← array[i]
        count[array[i]] ← count[array[i]] - 1
      return output
    */
      localparam
        ZERO         = 3'd0,
        MAKEHIST1    = 3'd1,
        MAKEHIST2    = 3'd2,
        PREFIXSUM    = 3'd3,
        PLACEOUTPUT1 = 3'd4,
        PLACEOUTPUT2 = 3'd5,
        IDLE         = 3'd7
        ;
    
      reg [4:0] count[0:3];
      reg [2:0] state = IDLE;
      reg [1:0] data;
    
      always @(posedge clk) begin
        if (reset == 1'b1) begin
          state <= ZERO;
          write_data_out_strobe <= 1'b0;
          busy <= 1'b1;
        end
        else begin
          case (state)
            ZERO:
            //count ← new array of k zeros
              begin
                count[0] <= 4'd0;
                count[1] <= 4'd0;
                count[2] <= 4'd0;
                count[3] <= 4'd0;
                addr_data_in <= 4'd0;
                state <= MAKEHIST1;
              end
            MAKEHIST1:
            //for i = 1 to length(array) do
            //  count[array[i]] ← count[array[i]] + 1
              begin
                data <= data_in;
                addr_data_in <= addr_data_in + 4'd1;
                state <= MAKEHIST2;
              end
            MAKEHIST2:
              begin
                count[data] <= count[data] + 4'd1;
                if (addr_data_in == 4'd0)
                  state <= PREFIXSUM;
                else
                  state <= MAKEHIST1;
              end
            PREFIXSUM:
            //for i = 2 to k do
            //  count[i] ← count[i] + count[i - 1]
              begin
                count[1] <= count[1] + count[0];
                count[2] <= count[2] + count[1] + count[0];
                count[3] <= count[3] + count[2] + count[1] + count[0];
                addr_data_in <= 4'd15;
                state <= PLACEOUTPUT1;
              end
            PLACEOUTPUT1:
            //for i = length(array) downto 1 do
            //  output[count[array[i]]] ← array[i]
            //  count[array[i]] ← count[array[i]] - 1
              begin
                data <= data_in;
                addr_data_in <= addr_data_in - 4'd1;
                write_data_out_strobe <= 1'b0;
                state <= PLACEOUTPUT2;
              end
            PLACEOUTPUT2:
              begin
                addr_data_out <= count[data] - 5'd1;
                data_out <= data;
                count[data] <= count[data] - 4'd1;
                write_data_out_strobe <= 1'b1;
                if (addr_data_in == 4'd15)
                  state <= IDLE;
                else
                  state <= PLACEOUTPUT1;
              end
            IDLE:
              begin
                write_data_out_strobe <= 1'b0;
                busy <= 1'b0;
              end
          endcase
        end  // of else
      end  // of always
    endmodule
    

    You can see that because of the way I'm using count, this will surely generate lots of muxes and decoders, just because I'm using a register value as the address for count[] in some places. However, I think this will synthesize much faster. Yosis can make it in a couple of seconds, FYI.

    Besides, here you have a test bench for the above module:

    module tb_counting_sort;
      reg clk, reset;
      wire [3:0] addr_data_in, addr_data_out;
      wire [1:0] data_in,data_out;
      wire write_data_out_strobe, busy;
    
      sort uut (
        .clk(clk),
        .reset(reset),
        .addr_data_in(addr_data_in),
        .data_in(data_in),
        .addr_data_out(addr_data_out),
        .data_out(data_out),
        .write_data_out_strobe(write_data_out_strobe),
        .busy(busy)
      );
    
      reg [1:0] vector_in[0:15];
      reg [1:0] vector_out[0:15];
      assign data_in = vector_in[addr_data_in];
      always @(posedge clk)
        if (write_data_out_strobe == 1'b1)
          vector_out[addr_data_out] <= data_out;
    
      integer i;
      initial begin
        vector_in[0]  = 2'd2;
        vector_in[1]  = 2'd1;
        vector_in[2]  = 2'd0;
        vector_in[3]  = 2'd0;
        vector_in[4]  = 2'd3;
        vector_in[5]  = 2'd1;
        vector_in[6]  = 2'd0;
        vector_in[7]  = 2'd2;
        vector_in[8]  = 2'd1;
        vector_in[9]  = 2'd1;
        vector_in[10] = 2'd3;
        vector_in[11] = 2'd3;
        vector_in[12] = 2'd3;
        vector_in[13] = 2'd2;
        vector_in[14] = 2'd1;
        vector_in[15] = 2'd0;
    
        reset = 1'b1;
        clk = 1'b0;
        repeat (2) 
          @(posedge clk);
        reset = 1'b0;
    
        @(negedge busy);
        for (i=0;i<16;i=i+1)
          $write ("%1d ", vector_out[i]);
        $display("");
        $finish;
      end
    
      always begin
        clk = #5 ~clk;
      end
    endmodule
    

    Both modules can be viewed, simulated or synthesized at EDAPlayground, here: https://www.edaplayground.com/x/6GLj