Search code examples
c++cudaheadernsighttemplate-function

Template function calls to other functions


I understand the template functions usually are to be declared and defined in header files.

The problem I am having is that my template function makes calls to other functions. The prototypes of those other functions are in the same header file before the template function itself.

That portion of the code:

//header.h

template <int ignoreAdetection>
__global__ void MCMLKernel(SimState d_state, GPUThreadStates tstates)
{
  // photon structure stored in registers
  PhotonStructGPU photon;

  // random number seeds
  UINT64 rnd_x;
  UINT32 rnd_a;

  // Flag to indicate if this thread is active
  UINT32 is_active;

  // Restore the thread state from global memory.
  RestoreThreadState(&d_state, &tstates, &photon, &rnd_x, &rnd_a, &is_active);
...
...
}

The function RestoreThreadState is the first of several functions called from this template function. The others are called within a for loop.

I am not sure if this template function should or should not be in the header file. If should be in a header file, how do I call those other functions?

The errors I am getting from the compiler during the instantiation of MCMLKernel:

  • error: explicit type is missing ("int" assumed)
  • error: variable "RestoreThreadState" may not be initialized
  • error: a value of type "SimState *" cannot be used to initialize an entity of type "int"
  • error: expected a ")"
  • warning: declaration is incompatible with previous "RestoreThreadState"

Additional details. All this functions are CUDA kernel functions. MCMLKernel is a __global__ kernel and the rest of the functions it calls are __device__ kernels. I am using Nsight Eclipse Edition and compute capability 1.3 GPUs (four Tesla C1060 cards).


Solution

  • @Eugene provided the answer to this question.

    I created the prototype of the called-to function like this

    __device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
                                   PhotonStructGPU *photon,
                                   UINT64 *rnd_x, UINT32 *rnd_a,
                                   UINT32 *is_active);
    

    But, with an example given by Eugene it should be like this (NOT A PROTOTYPE! See other answer)

    __device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
                                   PhotonStructGPU *photon,
                                   UINT64 *rnd_x, UINT32 *rnd_a,
                                   UINT32 *is_active){}
    

    Note the {} at the end.

    Next in the code I call to functions which actually return values like this

    __device__ int HitBoundary(PhotonStructGPU *photon);
    

    Called from the same MCMLKernel template function. It is giving me a warning:

    • warning: missing return statement at end of non-void function "HitBoundary"

    UPDATE: In another source file kernel.cu I have the following declaration and definition (reason why I have multiple definition problems):

    __device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
                                       PhotonStructGPU *photon,
                                       UINT64 *rnd_x, UINT32 *rnd_a,
                                       UINT32 *is_active)
    {
      UINT32 tid = blockIdx.x * NUM_THREADS_PER_BLOCK + threadIdx.x;
    
      *rnd_x = d_state->x[tid];
      *rnd_a = d_state->a[tid];
    
      photon->x = tstates->photon_x[tid];
      photon->y = tstates->photon_y[tid];
      photon->z = tstates->photon_z[tid];
      photon->ux = tstates->photon_ux[tid];
      photon->uy = tstates->photon_uy[tid];
      photon->uz = tstates->photon_uz[tid];
      photon->w = tstates->photon_w[tid];
      photon->sleft = tstates->photon_sleft[tid];
      photon->layer = tstates->photon_layer[tid];
    
      *is_active = tstates->is_active[tid];
    }
    

    IN SUMMARY: I have four source files

    1. main.cu
    2. kernel.cu
    3. rng.cu
    4. mem.cu

    Each source file, except main.cu has an associated header

    1. kernel.cuh
    2. rng.cuh
    3. mem.cuh

    where I want to forward-declare functions to use in main.cu.

    Everything is fine until I get to the template function which calls functions from kernel.cu and rng.cu.