Cons of virtual methods in cuda

So far as I understand, virtual method calls are late binding and thus cannot be inlined by the compiler. Apparently, nvcc relies heavily on inlining code. I'm wondering if virtual methods have any serious disadvantage if used in a kernel in Cuda. Is there any situation where they should be avoided? Can they have an effect on performance?

Solution

If the compiler can devirtualize the call, it may be able to transform it into a regular method call or even inline it. Clang/LLVM, which powers NVCC, is capable of doing this in some cases, as an optimization. You will have to check the generated code to know whether this is the case.

If the compiler cannot devirtualize the call, then it may have an impact on performance, particularly if that call is on a hot path. A virtual call requires:

a vtable lookup;
an indirect branch.

The vtable lookup costs a memory access, which is slow (and may "waste" cache lines that could be better used) and indirect branches are expensive in general. Moreover, if not all threads within a warp resolve the virtual method to the same address (for example, when processing an array of object with different concrete types), this will lead to warp divergence, which is yet another performance hit.

That being said, if you are not calling the virtual method on a hot path, the impact should be negligible. Without further code, it's impossible to tell.