operator overloading in Cuda

I successfully created an operator+ between two float4 by doing :

__device__ float4 operator+(float4 a, float4 b) {
 // ...
}

However, if in addition, I want to have an operator+ for uchar4, by doing the same thing with uchar4, i get the following error: "error: more than one instance of overloaded function "operator+" has "C" linkage" "

I get a similar error message when I declare multiple functions with the same name but different arguments. So, two questions :

Polymorphism : Is-it possible to have multiple functions with the same name and different arguments in Cuda ? If so, why do I have this error message ?
operator+ for float4 : it seems that this feature is already included by including "cutil_math.h", but when I include that (#include <cutil_math.h>) it complains that there is no such file or directory... anything particular I should do ? Note: I am using pycuda, which is a cuda for python.

Thanks!

Solution

Note the "has "C" linkage" in the error. You are compiling your code with C linkage (pyCUDA does this by default to circumvent symbol mangling issues). C++ can't support multiple definitions of the same function name using C linkage.

The solution is to compile code without automatically generated "extern C", and explicitly specify C linkage only for kernels. So your code would looks something like:

__device__ float4 operator+(float4 a, float4 b) { ... };

extern "C"
__global__ void kernel() { };

rather than the standard pyCUDA emitted:

extern "C" 
{
     __device__ float4 operator+(float4 a, float4 b) { ... };

     __global__ void kernel() { };
}

pycuda.compiler.SourceModule has an option no_extern_c which can be used to control whether extern "C" is emitted by the just in time compilation system or not.