Storing a device_vector inside a functor through the constructor?

I am trying to store a thrust::device_vector inside a functor. A simple interpretation is as followed:

struct StructOperator : public thrust::unary_function<float, int>  {
  int num_;
  thrust::device_vector<int> v_test;

  explicit StructOperator(thrust::device_vector<int> const& input_v) :
    v_test(input_v), num_(input_v.size()) {};

  __host__ __device__
   float operator()(int index) {
      // magic happens
   }
};

which doesn't compile - nvcc keeps saying that calling a __host__ from a __host__ __device__ is not allowed. I have seen this question - is this the only way to achieve this?

Solution

When you put the __device__ decorator on your functor operator, you have now restricted what you can do in that operator body to things that are compatible with CUDA device code.

A thrust::device_vector is a class definition that is designed to facilitate thrust's expression/computation model (roughly similar to STL containers/algorithms). As such it includes both host and device code in it. The host code in a thrust::device_vector is not decorated for use on the device, and ordinary host code is not usable in CUDA device code.

thrust::device_vector is not designed nor intended to be used directly in device code. It cannot be used as you have suggested. Contrary to what might be surmised, it is not designed to be an analog of std::vector that is usable in CUDA device code. It is designed to be an analog of std::vector that is usable in thrust algorithms (which, by design, are callable/usable from host code). This is why you are getting messages when compiling, and there is no trivial way(*) to fix that.

Presumably the primary purpose of the thrust::device_vector is to act as a container to hold data that is usable/accessible on the device. The most direct equivalent in POD type data that is already supported in CUDA device code would be an array or a pointer to data.

Therefore I think its reasonable to answer your question with "yes" - that is the only way to achieve this.

I am lumping in a variety of similar approaches, such as passing a thrust pointer instead of a bare pointer.
(*)I am ignoring such ideas as writing your own container class that allows usage on the device, or making extensive modifications to thrust itself to somehow permit this behavior.

Here is a fully worked example, around what you have shown:

$ cat t1385.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/copy.h>


struct StructOperator : public thrust::unary_function<float, int>  {
  int num_;
  int *v_test;

  explicit StructOperator(int *input_v, int input_v_size) :
    v_test(input_v), num_(input_v_size) {};

  __host__ __device__
   float operator()(int index) {
      if (index < num_)  return v_test[index] + 0.5;
      return 0.0f;
   }
};

const int ds = 3;
int main(){

  thrust::device_vector<int> d(ds);
  thrust::sequence(d.begin(), d.end());
  thrust::device_vector<float> r(ds);
  thrust::transform(d.begin(), d.end(), r.begin(), StructOperator(thrust::raw_pointer_cast(d.data()), d.size()));
  thrust::copy(r.begin(), r.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl;
}
$ nvcc t1385.cu -o t1385
$ ./t1385
0.5,1.5,2.5,
$