Search code examples
scalariscvchisel

Stack memory implementation not working properly in chisel for rocket chip


Ive been trying to modify the rocc interface of the rocket core and I've currently modified the rocc interface to work as a scratchpad from where we can load and store data using the custom0 instruction. Im facing an issue when i try to push and pop data into a stack memory that i have created in chisel and instantiated inside my scratchpad. Im using the same custom0 instruction with different values of funct field to push and pop to the stack.

The code in chisel is as shown below for both scratchpad and my stack

class Comm_Scratchpad(n: Int = 8)(implicit p: Parameters) extends RoCC()(p) {
  val Stack_Lo = Module(new Stack_Snd(64) )
  val Scratchpad_Loc = Mem(UInt(width = xLen), n:Int)
  val busy = Reg(init=Vec(Bool(false), n))

  val cmd = Queue(io.cmd) //wired to decoupled command coming into rocc interface from rocket core
  val funct = cmd.bits.inst.funct //Function to decide what operation to perform
  val addr = cmd.bits.inst.rs2(log2Up(n)-1,0) //converts address specified by user(0,1,2,3) into address that can be used for scratchpad
  val doWrite = funct === UInt(0) //Used to check whether user wants to do a write derieves its value from funct
  val doRead = funct === UInt(1) //Ignored for now as every operation is by default performing a read
  val doPush = funct === UInt(2) //Funct will be used to load to lifo with the same custom instruction
  val doPop = funct === UInt(3) //funct will be used to load back from lifo

  Stack_Lo.io.en := Mux((doPush || doPop), Bool(true), Bool(false))
  Stack_Lo.io.push := Mux(doPush, Bool(true), Bool(false))
  Stack_Lo.io.pop := Mux(doPop, Bool(true), Bool(false))

  // datapath
  val data_in = cmd.bits.rs1
  val wdata = data_in
  val rdata = Mux(doPop, Stack_Lo.io.dataOut, Scratchpad_Loc(addr))
  Stack_Lo.io.dataIn := Mux(doPush, data_in, UInt(0))

  when (cmd.fire() && (doWrite)) {
    Scratchpad_Loc(addr) := wdata
  }

  val doResp = cmd.bits.inst.xd
  val stallReg = busy(addr)
  //val stallLoad = doLoad && !io.mem.req.ready
  val stallResp = doResp && !io.resp.ready && (doPush || doPop)

  cmd.ready := !stallReg && !stallResp //removed stall load as we are not loading from memory
    // command resolved if no stalls AND not issuing a load that will need a request

  // PROC RESPONSE INTERFACE
  io.resp.valid := cmd.valid && doResp && !stallReg //&& !stallLoad
    // valid response if valid command, need a response, and no stalls
  io.resp.bits.rd := cmd.bits.inst.rd
    // Must respond with the appropriate tag or undefined behavior
  io.resp.bits.data := rdata
    // Semantics is to always send out prior accumulator register value

  io.busy := cmd.valid || busy.reduce(_||_)
    // Be busy when have pending memory requests or committed possibility of pending requests
  io.interrupt := Bool(false)
    // Set this true to trigger an interrupt on the processor (please refer to supervisor documentation)

}


class Stack_IO (implicit p: Parameters) extends CoreBundle
{ 
    val dataIn  = UInt(INPUT,  64) 
    val dataOut = UInt(OUTPUT, 64) 
    val push    = Bool(INPUT) 
    val pop     = Bool(INPUT) 
    val en      = Bool(INPUT) 
}

class Stack_Snd(depth: Int) (implicit p: Parameters) extends CoreModule {

  val io = new Stack_IO;
  // declare the memory for the stack 
  val stack_mem = 
    Mem(UInt(width = 64), depth) 
  val sp = Reg(init = UInt(0, width = log2Up(depth))) 
  val dataOut = Reg(init = UInt(0, width = 64)) 

  // Push condition - make sure stack isn't full 
  when(io.en && io.push && (sp != UInt(depth-1))) { 
    stack_mem(sp) := io.dataIn 
    sp := sp + UInt(1)  
  } 
  // Pop condition - make sure the stack isn't empty 
  .elsewhen(io.en && io.pop && (sp > UInt(0))) { 
    dataOut := stack_mem(sp - UInt(1))
    sp := sp - UInt(1) 
  } 
 io.dataOut := dataOut
}

the C code that I'm executing in the front end server is as below.

// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out

#include <assert.h>
#include <stdio.h>
#include <stdint.h>


int main() {
  uint64_t x = 1, y = 456, z = 0, a=2, b=3, c=4, d=5, e=6;

  asm volatile ("custom0 x0, %0, 0, 2" : : "r"(x));
  asm volatile ("custom0 x0, %0, 1, 2" : : "r"(a));
  asm volatile ("custom0 x0, %0, 2, 2" : : "r"(b));
  asm volatile ("custom0 x0, %0, 3, 2" : : "r"(c));
  asm volatile ("custom0 x0, %0, 4, 2" : : "r"(d));
  asm volatile ("custom0 x0, %0, 5, 2" : : "r"(e));


  asm volatile ("custom0 %0, x0, 0, 3" : "=r"(z));
  printf("The popped value of z 0 is:- %d \n",z);

  asm volatile ("custom0 %0, x0, 1, 3" : "=r"(z));
  printf("The popped value of z 1 is:- %d \n",z);

  asm volatile ("custom0 %0, x0, 2, 3" : "=r"(z));
  printf("The popped value of z 2 is:- %d \n",z);

  asm volatile ("custom0 %0, x0, 3, 3" : "=r"(z));
  printf("The popped value of z 3 is:- %d \n",z);

  asm volatile ("custom0 %0, x0, 4, 3" : "=r"(z));
  printf("The popped value of z 4 is:- %d \n",z);

  asm volatile ("custom0 %0, x0, 5, 3" : "=r"(z));
  printf("The popped value of z 5 is:- %d \n",z);

  printf("success!\n");
}

so as you can see I'm trying to push a series of numbers from 1-6 but when i run this code on the zed board this is the output I'm getting is.

root@zynq:~# ./fesvr-zynq pk /sdcard/Custom\ elfs/rocc_fifo 
The popped value of z 0 is:- 1 
The popped value of z 1 is:- 0 
The popped value of z 2 is:- 2 
The popped value of z 3 is:- 2 
The popped value of z 4 is:- 2 
The popped value of z 5 is:- 2 

ideally it should pop 6,5,4,3,2,1


Solution

  • I resolved this issue, when the stack's push signal is high, the pop signal must be kept low and vice versa else when we push along with the pop signal high we immediately almost simultaneously pop the value as well and hence the output is as it looks in the above logs. Took a long time for me to debug and hence the late repost