I'm facing a problem with the reduce_by_key
function of the Thrust library. It looks like a bug to me but I'd like to be sure before reporting.
First, my setup: CUDA 7.0, Windows 8, NIVIDA GeForce 820m. The whole thing is compiled using Visual Studio 2010 and nvcc
in release mode, 64bits.
Now, the exercise that illustrates the problem.
I have a vector of random numbers called devData
generated on my device.
I tabulate a vector of indices called devIndices
of the same size defined as follows:
devIndices = {0, 0, 0, 0, 1, 1, 1, 1, ... K-1, K-1, K-1, K-1}
devData = { 1, 4, 5, 7, 5, 8, 9, 6, ... 7, 8, 9, 6}
So that each value in devIndices is repeated mod = 4 time in this example.
Then, I just want to reduce_by_key
devData
using devIndices
to obtain the reduced vectors that follow:
devIndices = {0, 1, ..., K-1}
devData = {17, 28,..., 30}
(if I'm right with the arithmetic :) )
Now, I know for sure that the elements of devIndices
should sum up to a value T
given by the following relation:
T = [(K-1) * K /2]
(ex: [0 1 2 3] -> 6 = (K-1)*K/2 = 3 * 4 /2
)I tried to do this on my machine and it works fine for small numbers of elements but it fails for large ones. (100,000
fails...)
Below is the code I use to manipulate my two vectors as described above and output the sum of devIndices
at the end. You can play around with the parameter k
that basically sets the number of elements.
#include <cuda.h>
#include <thrust/random.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/iterator/counting_iterator.h>
#include <fstream>
typedef typename thrust::device_vector<int> tDevVecInt;
typedef typename thrust::device_vector<float> tDevVecFlt;
struct rando : public thrust::unary_function<unsigned int, float>
{
unsigned int mainSeed;
rando(unsigned int _mainSeed):mainSeed(_mainSeed) {}
__host__ __device__ float operator()(unsigned int x)
{
unsigned int seed = x * mainSeed;
thrust::random::taus88 mac(seed);
thrust::uniform_real_distribution<float> dist(0,1);
return dist(mac);
}
};
struct modSim : public thrust::unary_function<int, int>
{
int sz;
modSim(int in)
{
this->sz = in;
}
__host__ __device__ int operator()(const int &x)
{
return x/sz;
}
};
int main()
{
int mod = 10;
int k = 10000;
int szData = k*mod;
tDevVecFlt devData(szData, 0.);
tDevVecInt devIndices(szData, 0.);
thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(0) + szData, devData.begin(), rando(123456789));
thrust::tabulate(devIndices.begin(), devIndices.end(), modSim(mod));
thrust::reduce_by_key(devIndices.begin(), devIndices.end(), devData.begin(), devIndices.begin(), devData.begin());
std::cout << thrust::reduce(devIndices.begin(), devIndices.begin()+ k, 0) << std::endl;
return 0;
}
Worst of all: when I run several times the same piece of code, I get different results! The random vector has nothing to do with this (it is seeded... and I checked it by the way).
So the question part now:
reduce_by_key
seems the right tool to me.Am I wrong somewhere?
The documentation for thrust::reduce_by_key states:
Precondition The input ranges shall not overlap either output range.
You have broken that precondition in your code:
thrust::reduce_by_key(devIndices.begin(), devIndices.end(), devData.begin(), devIndices.begin(), devData.begin());
So your code is broken, and is not representative of anything demonstrating a thrust bug. thrust::reduce_by_key
is not a thrust operation that can be done in-place.