Search code examples
cudathrust

Use thrust to find element in groups


I have two int vectors for keys and values, their size is about 500K.

The key vector is already sorted. And there are 10K groups approximately.

The value is non-negative(stands for useful) or -2(stands for no use), in each group there should be one or zero non-negative values, and the rest is -2.

key:   0  0  0  0 1  2  2  3  3  3  3 
value:-2 -2  1 -2 3 -2 -2 -2 -2 -2  0

The third pair of group 0 [0 1] is useful. For group 1 we get the pair[1 3]. The values of group 2 are all -2, so we get nothing. And for group 3, the result is [3 0].

So, the question is how can I do this by thrust or cuda ?

Here are two ideas.

First one: Get the number of each group by a histogram algorithm. So the barrier of each group can be computed. Operate thrust::find_if on each group to get the useful element.

Second one: Use thrust::transform to add 2 for every value and now all the value are non-negative, and zero stands for useless. Use thrust::reduce_by_key to get the reduction for every group, and then subtract 2 for every output value.

I think there must be some other methods which will achieve much more performance than the above two.

Performance of the methods:

I have test the Second method above and the method given by @Robert Crovella, ie. reduce_by_key and remove_if method. The size of the vectors is 2691028, the vectors consist of 100001 groups. Here is their average time:

reduce_by_key: 1204ms
remove_if: 192ms

From above result, we can see that remove_if method is much faster. And also the "remove_if" method is easy to implement and consume much less gpu memory.

Briefly, @Robert Crovella 's method is very good.


Solution

  • I would use a thrust::zip_iterator to zip the key and values pairs together, and then I would do a thrust::remove_if operation on the zipped values which would require a functor definition that would indicate to remove every pair for which the value is negative (or whatever test you wish.)

    Here's a worked example:

    $ cat t1009.cu
    #include <thrust/remove.h>
    #include <thrust/device_vector.h>
    #include <thrust/iterator/zip_iterator.h>
    #include <thrust/copy.h>
    
    struct remove_func
    {
      template <typename T>
      __host__ __device__
      bool operator()(T &t){
        return (thrust::get<1>(t) < 0); // could change to other kinds of tests
        }
    };
    
    int main(){
    
      int keys[] = {0,0,0,0,1,2,2,3,3,3,3};
      int vals[] = {-2,-2,1,-2,3,-2,-2,-2,-2,-2,0};
      size_t dsize = sizeof(keys)/sizeof(int);
    
      thrust::device_vector<int>dkeys(keys, keys+dsize);
      thrust::device_vector<int>dvals(vals, vals+dsize);
      auto zr = thrust::make_zip_iterator(thrust::make_tuple(dkeys.begin(), dvals.begin()));
    
      size_t rsize = thrust::remove_if(zr, zr+dsize, remove_func()) - zr;
    
      thrust::copy_n(dkeys.begin(), rsize, std::ostream_iterator<int>(std::cout, ","));
      std::cout << std::endl;
      thrust::copy_n(dvals.begin(), rsize, std::ostream_iterator<int>(std::cout, ","));
      std::cout << std::endl;
      return 0;
    }
    $ nvcc -std=c++11 -o t1009 t1009.cu
    $ ./t1009
    0,1,3,
    1,3,0,
    $