I have a functor used by thrust, where I need to specify its length dynamically , like
struct func {
const int h;
func(const int _h): h(_h) {}
__device__ __host__
void operator()(int id) {
double data[h];
}
};
I'm not sure how to do this, because h has to be a known number, but h is not known until run time.
The obvious way to solve this is use dynamic memory allocation, so the functor becomes
__device__ __host__
void operator()(int id) {
double *data = new double[h];
// functor code goes here
// Heap memory has context scope, so delete is necessary to stop leaks
delete[] data;
};
This will work on GPUs of compute capability of 2.0 or newer. The downside is that memory allocation will be on the runtime heap in global memoey, which limits compiler optimisations, and the new/free operators themselves are very slow, so having this happen for each thread in the kernel launch will cost a lot of performance.
An alternative, if the value range of h
is limited, consider replacing h within the operator code with a template parameter and then just use a selector instead for the known cases, so something like
template<int j>
__device__ __host__
void guts(int id) {
double data[j];
// code here
};
__device__ __host__
void guts_rt(int id) {
double *data = new double[h];
// code here
delete[] data;
};
__device__ __host__
void operator()(int id) {
switch (h) {
case 2:
guts<2>(id);
break;
case 4:
guts<4>(id);
break;
// As many as needed here
default:
guts_rt(id);
break;
}
}
ie. try and use hard coded arrays where possible (which the compiler can optimize for), and fall back to a dynamic solution otherwise (and if your GPU actually supports dynamic allocation of heap memory anyway).