I am implementing a custom binary predicate used by the thrust::max_element
search algorithm. It also keeps track of a value of interest which the algorithm alone cannot yield. It looks something like this:
struct cmp_class{
cmp_class(int *val1){
this->val1 = val1;
}
bool operator()(int i, int j){
if (j < *val1) { *val1 = j; }
return i < j;
}
int *val1;
};
int val1 = ARRAY[0];
std::max_element(ARRAY, ARRAY+length, cmp(&val1));
.... use val1
My actual binary predicate is quite a bit more complex, but the example above captures the gist of it: I am passing a pointer to an integer to the binary predicate cmp
, which then writes to that integer to keep track of some value (here the running minimum). Since max_element
first calls cmp()(ARRAY[0],ARRAY[1])
and then cmp()(running maximum,ARRAY[i])
, I only look at value j
inside cmp
and therefore initialize val1 = ARRAY[0]
to ensure ARRAY[0]
is taken into account.
If I do the above on the host, using std::max_element
for example, this works fine. The values of val1
is what I expect given known data. However, using Thrust to execute this on the GPU, its value is off. I suspect this is due to the parallelization of thrust::max_element
, which is recursively applied on sub arrays, the results of which form another array which thrust::max_element
is run on, etc. Does this hold water?
In general, the binary predicates used for thrust reductions are expected to be commutative. I'm using "commutative" here to mean that the predicate result should be valid regardless of the order in which the arguments are presented.
At the initial stage of a thrust parallel reduction, the arguments are likely to be presented in an order you might expect (i.e. in the order of the vectors passed to the reduce function, or in the order that the values occur in a single vector.) However, later on in the parallel reduction process, the origin of the arguments may get mixed up, during the parallel-sweeping. Any binary comparison functor that assumes an ordering to the arguments is probably broken, for thrust parallel reductions.
In your case, the boolean result of your comparison functor should be valid regardless of the order of arguments presented, and in that respect it appears to be properly commutative.
However, regarding the custom storage functionality you have built-in around val1, it seems pretty clear that the results in val1
could be different depending on the order in which arguments are passed to the functor. Consider this simple max-finding sequence amongst a set values passed to the functor as (i,j) (assume val1 starts out at a large value):
values: 10 5 3 7 12
comparison pairs: (10,5) (10,3) (10,7) (10,12)
comparison result: 10 10 10 12
val1 storage: 5 3 3 3
Now suppose that we simply reverse the order that arguments are presented to the functor:
values: 10 5 3 7 12
comparison pairs: (5,10) (3,10) (7,10) (12,10)
comparison result: 10 10 10 12
val1 storage: 10 10 10 10
Another issue is that you have no atomic protection on val1
that I can see:
if (j < *val1) { *val1 = j; }
The above line of code may be OK in a serial realization. In a parallel multi-threaded algorithm, you have the possibility for multiple threads to be accessing (reading and writing) *val1
simultaneously, which will have undefined results.