Search code examples
c++cudathrust

Using thrust::reduce to compute the sum over a vector of 8 bit integers without overflow


I've got a device vector of type uint8_t and I want to compute a sum over it using thrust::reduce if possible. The problem is that I get overflow, since the sum will be much larger than 255. I thought the code below would compute the sum, by storing the results as 32 bit integers, but it doesn't seem to be the case. Is there a good way to accomplish this?

uint8_t * flags_d;
...
const int32_t N_CMP_BLOCKS = thrust::reduce( 
    thrust::device_pointer_cast( flags_d ), 
    thrust::device_pointer_cast( flags_d ) + N,
    (int32_t) 0,
    thrust::plus<int32_t>() );

Solution

  • I think the only solution that will work is to use thrust::transform_reduce to explicitly cast the 8 bit input data to a 32 bit quantity before the accumulation operation in the reduction. So I would expect something like this:

    #include <thrust/transform_reduce.h>
    #include <thrust/functional.h>
    #include <thrust/execution_policy.h>
    
    template<typename T1, typename T2>
    struct char2int
    {
      __host__ __device__ T2 operator()(const T1 &x) const
      {
        return static_cast<T2>(x);
      }
    };
    
    int main()
    {
      unsigned char data[6] = {128, 100, 200, 102, 101, 123};
      int result = thrust::transform_reduce(thrust::host,
                                            data, data + 6,
                                            char2int<unsigned char,int>(),
                                            0,
                                            thrust::plus<int>());
    
      std::cout << "Result is " << result << std::endl;
     
      return 0;
    }
    

    to be more like what you had in mind.