Search code examples
cudagpgputhrust

How to reorder subarrays of fixed size inside a big 1D array in Thrust


I have 100000 arrays that are stored in a big 1D array. The size of each array is L =1000 and are ordered in descending order order.

Each array is divided in m=5 segments.(S1,S2,S3,S4,S5) For example :

A1=[S1,S2,S3,S4,S5]
A2=[S1,S2,S3,S4,S5]
A3=[S1,S2,S3,S4,S5]
…
A100000=[S1,S2,S3,S4,S5]

Here an example of the big container array is as follows: enter image description here

My question is :

For each window of w=10 arrays (for example) I want to reorganize the 10 arrays as follows: Put the segments S1 of each of these 10 arrays first, after that put S2 segments , S3 segments …. Hereunder an example with w =6 : enter image description here

As information L , m and w are parameters that may take on different values.

Is there a fast way to do it with Thrust ?


Solution

  • the methodology will be very similar to what is described here.

    1. construct a map vector (i.e. sequence produced by iterator) which will map input to output. We will use a thrust::counting_iterator to provide an ordinal sequence 0, 1, 2,..., and then construct the map iterator using a thrust::transform_iterator. The challenge is creating the correct arithmetic for the operation passed to the transform iterator. The code below is commented extensively to explain that.
    2. pass that map vector, via a thrust::permutation_iterator, to thrust::copy_n. Note that sometimes good advice is to not copy data. If you are only using this "transformed view" of the data once, then just pass that permutation iterator to whatever thrust algorithm you are using once, rather than actually copying the data. If you need to use the "transformed view" of the data many times, it may be more efficient to copy it once.

    Here is a worked example:

    $ cat t5.cu
    #include <thrust/sequence.h>
    #include <thrust/iterator/permutation_iterator.h>
    #include <thrust/iterator/transform_iterator.h>
    #include <thrust/iterator/discard_iterator.h>
    #include <thrust/device_vector.h>
    #include <thrust/copy.h>
    #include <iostream>
    
    typedef int dtype;  // data type
    typedef int itype;  // indexing type - change to size_t if L*m*s > int
    const itype s = 3;  // segment_width
    const itype m = 2;  // number of segments per sub-array
    const itype L = 4;  // number of sub-arrays per array
    const itype w = 2;  // width of group (# of sub-arrays) for segment reordering
    // S1 S1 S2 S2 S1 S1 S2 S2
    //  0  1  2  6  7  8  3  4  5  9 10 11 12 13 14 18 19 20 15 16 17 21 22 23
    
    using namespace thrust::placeholders;
    
    int main(){
    // we require data that consists of L*m*s elements
    // we also require L to be whole-number divisible by w
      thrust::device_vector<dtype> d_data(L*m*s);
      thrust::device_vector<dtype> d_result(L*m*s);
      thrust::sequence(d_data.begin(), d_data.end());
    // we will build up the necessary map iterator progressively
    // we will start with an target_index sequence i =              0, 1, 2, 3, 4, 5, ... which defines the location in the target vector
    // seg_idx    (position within a segment) = i%s                 0, 1, 2, 0, 1, 2, ...
    // which_seg  (which segment within sub-array) = (i/(w*s))%m    0, 0, 0, 0, 0, 0, 1, ...
    // which_sub  (which sub-array in group) = (i/s)%w              0, 0, 0, 1, 1, 1, 0, ...
    // which_grp  (which group in array) = i/(w*s*m)                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ...
    // map index = which_grp*group_width + which_sub*subarray_width + which_seg*segment_width + seg_idx
    // map index = (i/(w*s*m))*(w*s*m) + ((i/s)%w)*(s*m) + ((i/(w*s))%m)*s + i%s
      thrust::copy_n(thrust::make_permutation_iterator(d_data.begin(), thrust::make_transform_iterator(thrust::counting_iterator<itype>(0), (_1/(w*s*m))*(w*s*m) + ((_1/s)%w)*(s*m) + ((_1/(w*s))%m)*s + _1%s)), L*m*s, d_result.begin());
      thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<dtype>(std::cout, ","));
      std::cout << std::endl;
    }
    $ nvcc -o t5 t5.cu
    $ ./t5
    0,1,2,6,7,8,3,4,5,9,10,11,12,13,14,18,19,20,15,16,17,21,22,23,
    $
    

    Note that if the overall data length (L*m*s) is greater than what can be held in a int quantity, then the above code would have to be refactored to use size_t instead of int for the itype typedef.