I understand the template functions
usually are to be declared and defined in header files.
The problem I am having is that my template function
makes calls to other functions. The prototypes of those other functions are in the same header file before the template function itself.
That portion of the code:
//header.h
template <int ignoreAdetection>
__global__ void MCMLKernel(SimState d_state, GPUThreadStates tstates)
{
// photon structure stored in registers
PhotonStructGPU photon;
// random number seeds
UINT64 rnd_x;
UINT32 rnd_a;
// Flag to indicate if this thread is active
UINT32 is_active;
// Restore the thread state from global memory.
RestoreThreadState(&d_state, &tstates, &photon, &rnd_x, &rnd_a, &is_active);
...
...
}
The function RestoreThreadState
is the first of several functions called from this template function. The others are called within a for loop.
I am not sure if this template function should or should not be in the header file. If should be in a header file, how do I call those other functions?
The errors I am getting from the compiler during the instantiation of MCMLKernel:
Additional details. All this functions are CUDA kernel functions. MCMLKernel
is a __global__
kernel and the rest of the functions it calls are __device__
kernels. I am using Nsight Eclipse Edition and compute capability 1.3 GPUs (four Tesla C1060 cards).
@Eugene provided the answer to this question.
I created the prototype of the called-to function like this
__device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
PhotonStructGPU *photon,
UINT64 *rnd_x, UINT32 *rnd_a,
UINT32 *is_active);
But, with an example given by Eugene it should be like this (NOT A PROTOTYPE! See other answer)
__device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
PhotonStructGPU *photon,
UINT64 *rnd_x, UINT32 *rnd_a,
UINT32 *is_active){}
Note the {}
at the end.
Next in the code I call to functions which actually return values like this
__device__ int HitBoundary(PhotonStructGPU *photon);
Called from the same MCMLKernel
template function. It is giving me a warning:
UPDATE: In another source file kernel.cu I have the following declaration and definition (reason why I have multiple definition problems):
__device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
PhotonStructGPU *photon,
UINT64 *rnd_x, UINT32 *rnd_a,
UINT32 *is_active)
{
UINT32 tid = blockIdx.x * NUM_THREADS_PER_BLOCK + threadIdx.x;
*rnd_x = d_state->x[tid];
*rnd_a = d_state->a[tid];
photon->x = tstates->photon_x[tid];
photon->y = tstates->photon_y[tid];
photon->z = tstates->photon_z[tid];
photon->ux = tstates->photon_ux[tid];
photon->uy = tstates->photon_uy[tid];
photon->uz = tstates->photon_uz[tid];
photon->w = tstates->photon_w[tid];
photon->sleft = tstates->photon_sleft[tid];
photon->layer = tstates->photon_layer[tid];
*is_active = tstates->is_active[tid];
}
IN SUMMARY: I have four source files
Each source file, except main.cu has an associated header
where I want to forward-declare functions to use in main.cu.
Everything is fine until I get to the template function which calls functions from kernel.cu and rng.cu.