How to multiply two iterators and return the product to a thrust::reduce algorithm?

I have this reduce_by_key working fine, except I want to multiply dv_Vals with another vector (dv_Active) that contains 0 or 1 values so that the resultant product is a new iterator with either a value from dv_Vals or 0.

thrust::reduce_by_key(
            //dv_Keys = { 0,0,0,1,1,1,2,2,2,3,3,3 }
            dv_Keys.begin(),
            dv_Keys.end(),
            thrust::make_permutation_iterator(
                dv_Vals.begin(), //I want to multiply each element of dv_Vals with each element of dv_Active here.
                dv_Map.begin()
            ),
            thrust::make_discard_iterator(),
            dv_NewVals.begin()
        );

My attempt at rewriting the reduce_by_key (replaced dv_Vals with make_transform_iterator) gives me errors:

thrust::reduce_by_key(
            //dv_Keys = { 0,0,0,1,1,1,2,2,2,3,3,3 }
            dv_Keys.begin(),
            dv_Keys.end(),
            thrust::make_permutation_iterator(
                thrust::make_transform_iterator(
                    thrust::make_zip_iterator(
                        thrust::make_tuple(
                            dv_Vals.begin(),
                            dv_Active.begin()
                        )
                    ),
                    _1 * _2 //My attempt to multiply the value in dv_Vals with dv_Active.
                ),
                dv_Map.begin()
            ),
            thrust::make_discard_iterator(),
            dv_NewVals.begin()
        );

What am I doing wrong? Or is there a better way to multiply two vectors within a reduce?

Solution

The problem lies in

thrust::make_transform_iterator(
    thrust::make_zip_iterator(
        thrust::make_tuple(
            dv_Vals.begin(),
            dv_Active.begin()
        )
    ),
    _1 * _2 // does **not** compile!
)

The zip_iterator gives you a tuple, i.e. a single argument. _1 * _2 generates a functor template with two arguments (a "binary functor"). This can be solved using the zip_function feature which wraps the functor with another one (a "unary functor") unpacking the tuple:

#include <thrust/zip_function.h>
/ ...
thrust::make_transform_iterator(
    thrust::make_zip_iterator(
        thrust::make_tuple(
            dv_Vals.begin(),
            dv_Active.begin()
        )
    ),
    thrust::make_zip_function(_1 * _2)
)

Alternatively you can use a device lambda or functor which uses thrust::get:

thrust::make_transform_iterator(
    thrust::make_zip_iterator(
        thrust::make_tuple(
            dv_Vals.begin(),
            dv_Active.begin()
        )
    ),
    [] __device__ (auto val_and_active) {
        return thrust::get<0>(val_and_active) * thrust::get<1>(val_and_active);
    }
)

nvcc needs the --extended-lambda flag for this to compile.

Something like thrust::get<0>(_1) * thrust::get<1>(_1) doesn't work at the time of writing, although I believe it could be implemented in Thrust by specializing thrust::get if someone asks for it (posts an issue). Although the maintainers might not see a point to it due to the existence of thrust::make_zip_function which lets you write nicer looking code (without overhead?).