I have 100000 arrays that are stored in a big 1D array. The size of each array is L =1000 and are ordered in descending order order.
Each array is divided in m=5 segments.(S1,S2,S3,S4,S5) For example :
A1=[S1,S2,S3,S4,S5]
A2=[S1,S2,S3,S4,S5]
A3=[S1,S2,S3,S4,S5]
…
A100000=[S1,S2,S3,S4,S5]
Here an example of the big container array is as follows:
My question is :
For each window of w=10 arrays (for example) I want to reorganize the 10 arrays as follows: Put the segments S1 of each of these 10 arrays first, after that put S2 segments , S3 segments …. Hereunder an example with w =6 :
As information L , m and w are parameters that may take on different values.
Is there a fast way to do it with Thrust ?
the methodology will be very similar to what is described here.
thrust::counting_iterator
to provide an ordinal sequence 0, 1, 2,..., and then construct the map iterator using a thrust::transform_iterator
. The challenge is creating the correct arithmetic for the operation passed to the transform iterator. The code below is commented extensively to explain that. thrust::permutation_iterator
, to thrust::copy_n
. Note that sometimes good advice is to not copy data. If you are only using this "transformed view" of the data once, then just pass that permutation iterator to whatever thrust algorithm you are using once, rather than actually copying the data. If you need to use the "transformed view" of the data many times, it may be more efficient to copy it once.Here is a worked example:
$ cat t5.cu
#include <thrust/sequence.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <iostream>
typedef int dtype; // data type
typedef int itype; // indexing type - change to size_t if L*m*s > int
const itype s = 3; // segment_width
const itype m = 2; // number of segments per sub-array
const itype L = 4; // number of sub-arrays per array
const itype w = 2; // width of group (# of sub-arrays) for segment reordering
// S1 S1 S2 S2 S1 S1 S2 S2
// 0 1 2 6 7 8 3 4 5 9 10 11 12 13 14 18 19 20 15 16 17 21 22 23
using namespace thrust::placeholders;
int main(){
// we require data that consists of L*m*s elements
// we also require L to be whole-number divisible by w
thrust::device_vector<dtype> d_data(L*m*s);
thrust::device_vector<dtype> d_result(L*m*s);
thrust::sequence(d_data.begin(), d_data.end());
// we will build up the necessary map iterator progressively
// we will start with an target_index sequence i = 0, 1, 2, 3, 4, 5, ... which defines the location in the target vector
// seg_idx (position within a segment) = i%s 0, 1, 2, 0, 1, 2, ...
// which_seg (which segment within sub-array) = (i/(w*s))%m 0, 0, 0, 0, 0, 0, 1, ...
// which_sub (which sub-array in group) = (i/s)%w 0, 0, 0, 1, 1, 1, 0, ...
// which_grp (which group in array) = i/(w*s*m) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ...
// map index = which_grp*group_width + which_sub*subarray_width + which_seg*segment_width + seg_idx
// map index = (i/(w*s*m))*(w*s*m) + ((i/s)%w)*(s*m) + ((i/(w*s))%m)*s + i%s
thrust::copy_n(thrust::make_permutation_iterator(d_data.begin(), thrust::make_transform_iterator(thrust::counting_iterator<itype>(0), (_1/(w*s*m))*(w*s*m) + ((_1/s)%w)*(s*m) + ((_1/(w*s))%m)*s + _1%s)), L*m*s, d_result.begin());
thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<dtype>(std::cout, ","));
std::cout << std::endl;
}
$ nvcc -o t5 t5.cu
$ ./t5
0,1,2,6,7,8,3,4,5,9,10,11,12,13,14,18,19,20,15,16,17,21,22,23,
$
Note that if the overall data length (L*m*s
) is greater than what can be held in a int
quantity, then the above code would have to be refactored to use size_t
instead of int
for the itype
typedef.