I can't find in the docs any implementstion, that will show, how i can retrieve indexes of values in vector (1d matrix), that have matching values. The closest example is:
Mat b;
Mat a = b == 5;
So this should give me a matrix of booleans. And then I can use it to extract indexes of values that are equal to 5. Is there a more performant way? It should send all the values in vector to the GPU in parallel and then return the index of the only (or first) value that equals 5. And it shouldn't be anything from "algorithm" framework, like:
std::find(...);
Only interested in parallel GPU solutions.
It can be easily done by transform reduction.
First you transform the matched vector element to its index and unmatched elements to a large number, say vector size.
Then in the reduction stage you find the minimum, which is the index of the first matching element in the array.
This is an O(log(n)) algorithm and can be done by GPU efficiently.
You could implement it either by thrust or writing your own kernel.
https://thrust.github.io/doc/group__transformed__reductions.html