Search code examples
c++cudathrust

thrust dynamic memory allocation for array


I have a functor used by thrust, where I need to specify its length dynamically , like

struct func { 

       const int h;

       func(const int _h): h(_h) {}

       __device__ __host__
       void operator()(int id) {
              double data[h];
      }
};

I'm not sure how to do this, because h has to be a known number, but h is not known until run time.


Solution

  • The obvious way to solve this is use dynamic memory allocation, so the functor becomes

       __device__ __host__
       void operator()(int id) {
            double *data  = new double[h];
    
            // functor code goes here
    
            // Heap memory has context scope, so delete is necessary to stop leaks
            delete[] data; 
       };
    

    This will work on GPUs of compute capability of 2.0 or newer. The downside is that memory allocation will be on the runtime heap in global memoey, which limits compiler optimisations, and the new/free operators themselves are very slow, so having this happen for each thread in the kernel launch will cost a lot of performance.

    An alternative, if the value range of h is limited, consider replacing h within the operator code with a template parameter and then just use a selector instead for the known cases, so something like

       template<int j>
       __device__ __host__
       void guts(int id) {
           double data[j];
           // code here
       };
    
       __device__ __host__
       void guts_rt(int id) {
           double *data = new double[h];
           // code here
           delete[] data;
       };
    
       __device__ __host__
       void operator()(int id) {
           switch (h) {
               case 2:
               guts<2>(id);
               break;
    
               case 4:
               guts<4>(id);
               break;
    
               // As many as needed here
    
               default:
               guts_rt(id);
               break;
          }
      }
    

    ie. try and use hard coded arrays where possible (which the compiler can optimize for), and fall back to a dynamic solution otherwise (and if your GPU actually supports dynamic allocation of heap memory anyway).