Search code examples
cudanvidiagpgputhrust

How to reduce nonconsecutive segments of numbers in array with Thrust


I have 1D array "A" which is composed from many arrays "a" like this : An example of an array "a" in "A"

I'm implementing a code to sum up non consecutive segments ( sum up the numbers in the segments of the same color of each array "a" in "A" as follow:

An example for one array "a" in "A"

Any ideas to do that efficiently with thrust?

Thank you very much

Note: The pictures represents only one array "a". The big array "A" contains many arrays "a"


Solution

  • In the general case, where the ordering of the data and grouping by segments is not known in advance, the general suggestion is to use thrust::sort_by_key to group like segments together, and then use thrust::reduce_by_key to sum the segments. Examples are given here.

    However, if the input data segments follow a known repeating pattern, such as is suggested here, we can eliminate the sorting step by using a thrust::permutation_iterator to "gather" the like segments together, as the input to thrust::reduce_by_key.

    Using the example data in the question, the hard part of this is to create the permutation iterator. For that, and using the specific number of segment types (3), segment lengths (3) and number of segments per segment type (3) given in the question, we need a map "vector" (i.e. iterator) for our permutation iterator that has the following sequence:

    0  1  2 9 10 11 18 19 20 3 4 5 12 13 14 21 22 23 ...
    

    This sequence would then "map" or rearrange the input array, so that all like segments are grouped together. I'm sure there are various ways to create such a sequence, but the approach I chose is as follows. We will start with the standard counting iterator sequence, and then apply a transform functor to it (using make_transform_iterator), so that we create the above sequence. I chose to do it using the following method, arranged in a stepwise sequence showing the components that are added together:

    counting iterator: (_1)                         0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 ...
    ---------------------------------------------------------------------------------------------------
    ((_1/seg_len)%seg_types)*(seg_len*seg_types):   0  0  0  9  9  9 18 18 18  0  0  0  9  9  9 18 18 18 ...
     _1%seg_len:                                    0  1  2  0  1  2  0  1  2  0  1  2  0  1  2  0  1  2 ...
    _1/(seg_len*seg_types)*seg_len:                 0  0  0  0  0  0  0  0  0  3  3  3  3  3  3  3  3  3 ...
     Sum:                                           0  1  2  9 10 11 18 19 20  3  4  5 12 13 14 21 22 23 ...            
    

    Here is a fully worked example:

    $ cat t457.cu
    #include <thrust/reduce.h>
    #include <thrust/iterator/permutation_iterator.h>
    #include <thrust/iterator/transform_iterator.h>
    #include <thrust/iterator/discard_iterator.h>
    #include <thrust/device_vector.h>
    #include <thrust/copy.h>
    #include <iostream>
    
    typedef int dtype;
    const int seg_len = 3;
    const int seg_types = 3;
    
    using namespace thrust::placeholders;
    
    int main(){
    
      dtype data[] = {10,16,14,2,4,4,1,2,1,8,2,10,3,1,6,8,0,2,9,1,0,3,5,2,3,2,1};
      //                0   1  2 9 10 11 18 19 20 3 4 5 12 13 14 21 22 23 ...
      //              ((_1/seg_len)%seg_types)*(seg_len*seg_types) + _1%seg_len + (_1/(seg_len*seg_types)*seg_len
    
      int ads = sizeof(data)/sizeof(data[0]);
      int num_groups = ads/(seg_len*seg_types); // ads is expected to be whole-number divisible by seg_len*seg_types
      int ds = num_groups*(seg_len*seg_types);  // handle the case when it is not
      thrust::device_vector<dtype> d_data(data, data+ds);
      thrust::device_vector<dtype> d_result(seg_types);
      thrust::reduce_by_key(thrust::make_transform_iterator(thrust::counting_iterator<int>(0), _1/(ds/seg_types)), thrust::make_transform_iterator(thrust::counting_iterator<int>(ds), _1/(ds/seg_types)), thrust::make_permutation_iterator(d_data.begin(), thrust::make_transform_iterator(thrust::counting_iterator<int>(0), ((_1/seg_len)%seg_types)*(seg_len*seg_types) + _1%seg_len + (_1/(seg_len*seg_types)*seg_len))), thrust::make_discard_iterator(), d_result.begin());
      thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<dtype>(std::cout, ","));
      std::cout << std::endl;
    }
    $ nvcc -o t457 t457.cu
    $ ./t457
    70,30,20,
    $