Search code examples
cudathrust

Can thrust::gather be used "in-place"?


Consider the following code:

#include <time.h>       // --- time
#include <stdlib.h>     // --- srand, rand
#include<fstream>

#include <thrust\host_vector.h>
#include <thrust\device_vector.h>
#include <thrust\sort.h>
#include <thrust\iterator\zip_iterator.h>

#include "TimingGPU.cuh"

/********/
/* MAIN */
/********/
int main() {

    const int N = 16384;

    std::ifstream h_indices_File, h_x_File;
    h_indices_File.open("h_indices.txt");
    h_x_File.open("h_x.txt");

    std::ofstream h_x_result_File;
    h_x_result_File.open("h_x_result.txt");

    thrust::host_vector<int> h_indices(N);
    thrust::host_vector<double> h_x(N);
    thrust::host_vector<double> h_sorted(N);

    for (int k = 0; k < N; k++) {
        h_indices_File >> h_indices[k];
        h_x_File >> h_x[k];
    }

    thrust::device_vector<int> d_indices(h_indices);
    thrust::device_vector<double> d_x(h_x);

    thrust::gather(d_indices.begin(), d_indices.end(), d_x.begin(), d_x.begin());
    h_x = d_x;
    for (int k = 0; k < N; k++) h_x_result_File << h_x[k] << "\n";

    //thrust::device_vector<double> d_x_sorted(N);
    //thrust::gather(d_indices.begin(), d_indices.end(), d_x.begin(), d_x_sorted.begin());
    //h_x = d_x_sorted;
    //for (int k = 0; k < N; k++) h_x_result_File << h_x[k] << "\n";

}

The code loads from file an array of indices h_indices.txt and a double array h_x.txt. Then, it transfers those arrays to the GPU to d_indices and d_x and uses thrust::gather to achieve Matlab's equivalent

d_x(d_indices)

The two txt files can be downloaded from h_indices.txt and h_x.txt. The code creates an output result file h_x_result.txt.

If I use the "in-place" version of thrust::gather (the last uncommented three lines of the code), then I obtain that the result is different from d_x(d_indices), while if I use the not "in-place" version (the last commented three lines of the code), then the result is correct.

In Matlab, I'm using

load h_indices.txt; load h_x.txt; load h_x_result.txt
plot(h_x(h_indices + 1)); hold on; plot(h_x_result, 'r'); hold off

The "in-place" case returns the following comparison enter image description here

On the other side, the "in-place" case returns enter image description here

I'm using Windows 10, CUDA 8.0, Visual Studio 2013, compiling in Release Mode and running on an NVIDIA GTX 960 cc. 5.2.


Solution

  • Thrust gather can't be used in place.

    But I would go as far as to suggest that no "naïve" gather operation can be safely performed in-place, and that the Matlab snippet you presented as in-place (presumably d_x = d_x(d_indices)) isn't an in-place operation at all.