Search code examples
cudagputhrust

calling thrust algorithms inside a thrust functor


I am using thrust::reduce inside a functor which is an argument in thrust::transform_reduce. The situation looks like a nested thrust algorithm. The compilation succeeds but it runs with error:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  cudaEventSynchronize in future::wait: an illegal memory access was encountered
Aborted (core dumped)

The code is as followed:

#include <thrust/inner_product.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>

#include <iostream>
#include <cmath>
#include <boost/concept_check.hpp>


struct aFuntor : public thrust::unary_function<int, int>
{
    aFuntor(int* av__, int* bv__, const int& N__) : av_(av__), bv_(bv__), N_(N__) {};

    __host__ __device__
    int operator()(const int& idx)
    {

    thrust::device_ptr<int> av_dpt = thrust::device_pointer_cast(av_);

    int res = thrust::reduce(av_dpt, av_dpt+N_);

        return res;
    }

    int* av_;
    int* bv_;
    int N_;
};


int main(void)
{
      int N = 5;
      std::vector<int> av = {0,1,3,5};
      std::vector<int> bv = {0,10,20,30};
      thrust::device_vector<int> av_d(N);
      thrust::device_vector<int> bv_d(N);
      av_d = av; bv_d = bv;

      // initial value of the reduction
      int init=0;

      // binary operations
      thrust::plus<int>        bin_op;

      int res =
      thrust::transform_reduce(thrust::counting_iterator<int>(0),
                               thrust::counting_iterator<int>(N-1),
                   aFuntor(thrust::raw_pointer_cast(av_d.data()), 
                      thrust::raw_pointer_cast(bv_d.data()),
                      N),
                init,
                bin_op);    

      std::cout << "result is: " << res << std::endl;
      return 0;
}

does thrust support this kind of nested structure? or there isn't any way around except having to redesign my algorithm? AFAIK there are algorithms that are difficult to expose parallelism?

Thank you in advance!


Solution

  • Thrust allows for nested algorithm usage. However, it's necessary to make sure that thrust chooses only the device path when launching thrust algorithms from device code, and in your case this is not happening. At least on my system (Ubuntu 14.04) when I compile your code as-is, I get an indication of that:

    t113.cu(20) (col. 9): warning: calling a __host__ function("thrust::reduce< ::thrust::device_ptr<int> > ") from a __host__ __device__ function("aFuntor::operator ()") is not allowed
    

    So that is clearly not what is wanted here. Instead, we can force thrust to use the device path (in device code - which is essentially implicit in your functor definition, since you are passing device pointers) with a thrust execution policy of thrust::device. When I make the following changes, your code compiles and runs without error for me:

    $ cat t113.cu
    #include <thrust/inner_product.h>
    #include <thrust/functional.h>
    #include <thrust/device_vector.h>
    
    #include <iostream>
    #include <cmath>
    #include <thrust/execution_policy.h>
    //#include <boost/concept_check.hpp>
    
    
    struct aFuntor : public thrust::unary_function<int, int>
    {
        aFuntor(int* av__, int* bv__, const int& N__) : av_(av__), bv_(bv__), N_(N__) {};
    
        __host__ __device__
        int operator()(const int& idx)
        {
    
        thrust::device_ptr<int> av_dpt = thrust::device_pointer_cast(av_);
    
        int res = thrust::reduce(thrust::device, av_dpt, av_dpt+N_);
    
            return res;
        }
    
        int* av_;
        int* bv_;
        int N_;
    };
    
    
    int main(void)
    {
          int N = 5;
          std::vector<int> av = {0,1,3,5};
          std::vector<int> bv = {0,10,20,30};
          thrust::device_vector<int> av_d(N);
          thrust::device_vector<int> bv_d(N);
          av_d = av; bv_d = bv;
    
          // initial value of the reduction
          int init=0;
    
          // binary operations
          thrust::plus<int>        bin_op;
    
          int res =
          thrust::transform_reduce(thrust::counting_iterator<int>(0),
                                   thrust::counting_iterator<int>(N-1),
                       aFuntor(thrust::raw_pointer_cast(av_d.data()),
                          thrust::raw_pointer_cast(bv_d.data()),
                          N),
                    init,
                    bin_op);
    
          std::cout << "result is: " << res << std::endl;
          return 0;
    }
    $ nvcc -std=c++11 -arch=sm_61  -o t113 t113.cu
    $ ./t113
    result is: 36
    $
    

    I haven't actually tried to parse your intent from the code, so I can't say for sure this is the correct answer, but that doesn't seem to be the question you are asking. (Later: the answer seems correct. Your functor is just producing the value of 9 for every element, and you are reducing 9 across 4 elements 9x4=36).

    Having said all that, it's not entirely clear (to me) why thrust is choosing the host path in your original case. If you like, you could file a thrust issue for that. But it's entirely possible that I haven't thought through the thrust dispatch system carefully enough. The host code algorithm dispatch (transform_reduce) might be somewhat confusing to thrust because it may not be obvious whether you are using host or device containers, for example.