Consider the following code:
#include <time.h> // --- time
#include <stdlib.h> // --- srand, rand
#include<fstream>
#include <thrust\host_vector.h>
#include <thrust\device_vector.h>
#include <thrust\sort.h>
#include <thrust\iterator\zip_iterator.h>
#include "TimingGPU.cuh"
/********/
/* MAIN */
/********/
int main() {
const int N = 16384;
std::ifstream h_indices_File, h_x_File;
h_indices_File.open("h_indices.txt");
h_x_File.open("h_x.txt");
std::ofstream h_x_result_File;
h_x_result_File.open("h_x_result.txt");
thrust::host_vector<int> h_indices(N);
thrust::host_vector<double> h_x(N);
thrust::host_vector<double> h_sorted(N);
for (int k = 0; k < N; k++) {
h_indices_File >> h_indices[k];
h_x_File >> h_x[k];
}
thrust::device_vector<int> d_indices(h_indices);
thrust::device_vector<double> d_x(h_x);
thrust::gather(d_indices.begin(), d_indices.end(), d_x.begin(), d_x.begin());
h_x = d_x;
for (int k = 0; k < N; k++) h_x_result_File << h_x[k] << "\n";
//thrust::device_vector<double> d_x_sorted(N);
//thrust::gather(d_indices.begin(), d_indices.end(), d_x.begin(), d_x_sorted.begin());
//h_x = d_x_sorted;
//for (int k = 0; k < N; k++) h_x_result_File << h_x[k] << "\n";
}
The code loads from file an array of indices h_indices.txt
and a double
array h_x.txt
. Then, it transfers those arrays to the GPU to d_indices
and d_x
and uses thrust::gather
to achieve Matlab's equivalent
d_x(d_indices)
The two txt files can be downloaded from h_indices.txt and h_x.txt. The code creates an output result file h_x_result.txt
.
If I use the "in-place" version of thrust::gather
(the last uncommented three lines of the code), then I obtain that the result is different from d_x(d_indices)
, while if I use the not "in-place" version (the last commented three lines of the code), then the result is correct.
In Matlab, I'm using
load h_indices.txt; load h_x.txt; load h_x_result.txt
plot(h_x(h_indices + 1)); hold on; plot(h_x_result, 'r'); hold off
The "in-place" case returns the following comparison
On the other side, the "in-place" case returns
I'm using Windows 10, CUDA 8.0, Visual Studio 2013, compiling in Release Mode and running on an NVIDIA GTX 960 cc. 5.2.
Thrust gather
can't be used in place.
But I would go as far as to suggest that no "naïve" gather operation can be safely performed in-place, and that the Matlab snippet you presented as in-place (presumably d_x = d_x(d_indices)
) isn't an in-place operation at all.