Search code examples
cudathrust

Thrust: why always host code is executed in spite of __CUDA_ARCH__


I try to define two branches in code: one for CUDA execution and the other - without it (with future OMP in mind). But when I use macro __CUDA_ARCH__ it looks as if always the host code is executed. But I supposed that Thrust by default use CUDA (and branch for device code). What's wrong with my code? Here it is:

#include <thrust/transform.h>                                 
#include <thrust/functional.h>                                
#include <thrust/iterator/counting_iterator.h>                
#include <stdio.h>                                            

struct my_op                                                  
{                                                             
    my_op(int init_const) : constanta(init_const) {}      
    __host__ __device__ int operator()(const int &x) const
    {                                                     
        #if defined(__CUDA_ARCH__)                    
            return 2 * x * constanta;    // never executed - why?
        #else                                     
            return x * constanta;        // always executed                 
        #endif                       
    }                                                     

private:                                                      
    int constanta;                                        
};                                                            

int main()                                                    
{                                                             
 int data[7] = { 0, 0, 0, 0, 0, 0, 0 };                        
 thrust::counting_iterator<int> first(10);                     
 thrust::counting_iterator<int> last = first + 7;              

 int init_value = 1;                                           
 my_op op(init_value);                                         

 thrust::transform(first, last, data, op);                     
 for each (int el in data)                                     
    std::cout << el << " ";                               

 std::cout << std::endl;                                       
}                  

I expect that "transform" will define vector as multiplied by 2*constanta but I see that host code is used - the output is "10 11 12 13 14 15 16", not "20 22 24 26 28 30 32" (as expected).

Why?


Solution

  • Thrust is choosing the host path because one of your data items supplied to the thrust transform operation is in host memory:

     thrust::transform(first, last, data, op); 
                                    ^^^^
    

    If you want a thrust algorithm to operate on the device, generally speaking all the container data you pass to/from must also reside in device memory.

    Here's a modification to your code that demonstrates that thrust will follow the device path if we replace data with a device-resident container:

    $ cat t13.cu
    #include <thrust/transform.h>
    #include <thrust/functional.h>
    #include <thrust/iterator/counting_iterator.h>
    #include <thrust/device_vector.h>
    #include <stdio.h>
    
    struct my_op
    {
        my_op(int init_const) : constanta(init_const) {}
        __host__ __device__ int operator()(const int &x) const
        {
            #if defined(__CUDA_ARCH__)
                return 2 * x * constanta;    // never executed - why?
            #else
                return x * constanta;        // always executed
            #endif
        }
    
    private:
        int constanta;
    };
    
    int main()
    {
    // int data[7] = { 0, 0, 0, 0, 0, 0, 0 };
     thrust::counting_iterator<int> first(10);
     thrust::counting_iterator<int> last = first + 7;
     thrust::device_vector<int> d_data(7);
    
     int init_value = 1;
     my_op op(init_value);
    
     thrust::transform(first, last, d_data.begin(), op);
     for (int el = 0; el < 7; el++) {
        int dat = d_data[el];
        std::cout << dat  << " ";    }
    
     std::cout << std::endl;
    }
    $ nvcc -arch=sm_61 -o t13 t13.cu
    $ ./t13
    20 22 24 26 28 30 32
    $
    

    You may want to read the thrust quick start guide to learn about thrust algorithm dispatch.