Search code examples
visual-studiocudathrust

Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code


I'm receiving the compiler error

static_assert failed: 'Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.'

when compiling this code:

thrust::device_vector<float2> a;
thrust::device_vector<float> b;

float param1, param2;
float2 param3;

thrust::transform_reduce(
    thrust::make_zip_iterator(thrust::make_tuple(a.begin(), b.begin())),
    thrust::make_zip_iterator(thrust::make_tuple(a.end(), b.end())),
    [param1, param2, param3] __device__ (thrust::tuple<float2, float> const& tuple)
    {
        /* do something and return a float2 */
    },
    float2{},
    [] __device__ (float2 const& first, float2 const& second)
    {
        float2 result{};
        result.x = first.x + second.x;
        result.y = first.y + second.y;
        return result;
    });

How can I rewrite this code so that it compiles? I'm new to CUDA/Thrust and its quite obscure to me how I need to deal with this problem.

(BTW, do I really need the complicaed "transformation" function here? thrust::plus<float2> doesn't work.)


Solution

  • How can I rewrite this code so that it compiles?

    A possible solution seems to be indicated in the assert message you posted in your question:

    static_assert failed: 'Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.'

    When I change each lambda to be decorated with __host__ __device__ instead of just __device__ then the code compiles for me on CUDA 12.1

    (BTW, do I really need the complicaed "transformation" function here? thrust::plus doesn't work.)

    CUDA doesn't provide arithmetic operators for the vector types supplied by CUDA, and AFAIK thrust doesn't either:

    then x+y must be defined

    So you'll need to provide your own definition to add float2 types, i.e.

    __host__ __device__ __forceinline__
    float2 operator+(float2 left, float2 right) noexcept {
        return float2{left.x + right.x,
                      left.y + right.y};
    }
    

    will allow you to use thrust::plus<float2>.