Search code examples
c++cudareducethrust

Thrust/CUDA reduce_by_key gives non-deterministic result


I'm facing a problem with the reduce_by_key function of the Thrust library. It looks like a bug to me but I'd like to be sure before reporting.

First, my setup: CUDA 7.0, Windows 8, NIVIDA GeForce 820m. The whole thing is compiled using Visual Studio 2010 and nvcc in release mode, 64bits.

Now, the exercise that illustrates the problem.

I have a vector of random numbers called devData generated on my device.
I tabulate a vector of indices called devIndices of the same size defined as follows:

  • devIndices = {0, 0, 0, 0, 1, 1, 1, 1, ... K-1, K-1, K-1, K-1}
  • devData = { 1, 4, 5, 7, 5, 8, 9, 6, ... 7, 8, 9, 6}

So that each value in devIndices is repeated mod = 4 time in this example.

Then, I just want to reduce_by_key devData using devIndices to obtain the reduced vectors that follow:

  • devIndices = {0, 1, ..., K-1}
  • devData = {17, 28,..., 30}

(if I'm right with the arithmetic :) )

Now, I know for sure that the elements of devIndices should sum up to a value T given by the following relation:

  • T = [(K-1) * K /2] (ex: [0 1 2 3] -> 6 = (K-1)*K/2 = 3 * 4 /2)

I tried to do this on my machine and it works fine for small numbers of elements but it fails for large ones. (100,000 fails...)

Below is the code I use to manipulate my two vectors as described above and output the sum of devIndices at the end. You can play around with the parameter k that basically sets the number of elements.

#include <cuda.h>
#include <thrust/random.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/iterator/counting_iterator.h>
#include <fstream>
typedef typename thrust::device_vector<int>     tDevVecInt;
typedef typename thrust::device_vector<float>   tDevVecFlt;

struct rando : public thrust::unary_function<unsigned int, float>
{
    unsigned int mainSeed;
    rando(unsigned int _mainSeed):mainSeed(_mainSeed) {}
    __host__ __device__ float operator()(unsigned int x) 
    {
        unsigned int seed = x * mainSeed;
        thrust::random::taus88 mac(seed);
        thrust::uniform_real_distribution<float> dist(0,1);
        return dist(mac);
    }
};

struct modSim : public thrust::unary_function<int, int>  
{
    int sz;
    modSim(int in)
    {
        this->sz = in;
    }
    __host__ __device__ int operator()(const int &x) 
    {
        return x/sz;
    }
};

int main() 
{
    int mod = 10;
    int k = 10000;
    int szData = k*mod;
    
    tDevVecFlt devData(szData, 0.);
    tDevVecInt devIndices(szData, 0.);
    
    thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(0) + szData, devData.begin(), rando(123456789));    
    thrust::tabulate(devIndices.begin(), devIndices.end(), modSim(mod)); 
    thrust::reduce_by_key(devIndices.begin(), devIndices.end(), devData.begin(), devIndices.begin(), devData.begin());
    std::cout << thrust::reduce(devIndices.begin(), devIndices.begin()+ k, 0) << std::endl;
    return 0;
}

Worst of all: when I run several times the same piece of code, I get different results! The random vector has nothing to do with this (it is seeded... and I checked it by the way).

So the question part now:

  • Am I wrong somewhere? reduce_by_key seems the right tool to me.
  • Can anyone reproduce this irreproducibility?
  • If this is indeed a bug, what is the usual way to report?

Solution

  • Am I wrong somewhere?

    The documentation for thrust::reduce_by_key states:

    Precondition The input ranges shall not overlap either output range.

    You have broken that precondition in your code:

    thrust::reduce_by_key(devIndices.begin(), devIndices.end(), devData.begin(), devIndices.begin(), devData.begin());
    

    So your code is broken, and is not representative of anything demonstrating a thrust bug. thrust::reduce_by_key is not a thrust operation that can be done in-place.