Search code examples
cudathrust

Thrust - Initial device_vector


Thanks for replying my question @Eric Shiyin Kang, but not the prefix "host" or "device" cause my problem, after some try and error, I found the errors is "member data always a constant" make a example:

struct OP {
    int N;
    __host__ __device__
    OP(const int n): N(n){};

    __host__ __device__
    UI operator()(const UI a) {
        int b = a * N;
        N++;
        return b;
    }
}
thrust::transform(A.begin(), A.end(), B.begin(), OP(2) );

In this case, if A is {0, 1, 2, 3, ... }, then B is {0, 2, 4, 6, 8}, but the actual B should be {0, 3(1*(2+1)), 8(2*(3+1)), 15(3*(4+1)),....}

I don't know what cause this situation, cause of thrust designing? Could anyone tell me?


Solution

  • For your updated Q, the host var N can not be updated in device code. It is generally not safe to update a shared variable many times in the parallel algorithm.

    In fact the fastest way to initialize a dev vector should be using fancy iterators during the object construction stage like,

    // v[]={0,2,4,6,8...}
    thrust::device_vector<float> v(
            thrust::make_transform_iterator(
                    thrust::counting_iterator<float>(0.0),
                    _1 * 2.0),
            thrust::make_transform_iterator(
                    thrust::counting_iterator<float>(0.0),
                    _1 * 2.0) + SIZE);
    
    // u[]={0,3,8,15...}
    thrust::device_vector<float> u(
            thrust::make_transform_iterator(
                    thrust::counting_iterator<float>(0.0),
                    _1 * (_1 + 2.0)),
            thrust::make_transform_iterator(
                    thrust::counting_iterator<float>(0.0),
                    _1 * (_1 + 2.0)) + SIZE);
    

    It will be a few times faster than the define-sequence-and-transform way since the latter method read/write the whole device mem v more than once.

    Pleas note the above code only works with Thrust 1.6.0+, since the lambda expression functor is used with the fancy iterators. For Thrust 1.5.3 in CUDA 5.0, you should write a functor explicitly.


    A for the original Q which has been deleted.

    You could add both __host__ and __device__ qualifiers in front of the operator()() like,

    struct OP {
        __host__ __device__ void operator()(int &a) {
            a *=2;
        }
    }
    

    and

    struct OP {
        __host__ __device__ int operator()(int a) {
            return a*2;
        }
    }
    

    otherwise the compiler won't generate proper device code for GPU.