I attempted to compile PyTorch from source but ran into a very strange linking error. After investigating, I discovered that a templated function defined in a C++ file and its declaration in a CUDA file generate two different mangled names.
C++ definition: _ZNK2at10TensorBase14const_data_ptrIdTnNSt9enable_ifIXntsr3stdE10is_const_vIT_EEiE4typeELi0EEEPKS3_v
CUDA declaration: _ZNK2at10TensorBase14const_data_ptrIdTnNSt9enable_ifIXntsr4__T0E10is_const_vIT_EEiE4typeELi0EEEPKS3_v
The 2 names look almost identical except for the bold parts.
When I use llvm-cxxfilt
, the 2 mangled names returned the same function name double const* at::TensorBase::const_data_ptr<double, 0>() const
Tools version:
Could you please explain for me:
Here's a demangling hint: strings have length indicated before them in the mangled names. In this case, the part that differs is "3" followed by 3 letters vs "4" followed by 4 letters, so that should be our case.
Next, what sort of string is replaced? We have "std" vs "__T0", so a reasonable guess is a namespace with "E10" indicating ::
. The difference would be that one compiler is using std::is_const_v
and the other __T0::is_const_v
. Indeed, in a minimal example with your function
namespace at {
class TensorBase {
public:
template <typename T, typename std::enable_if<!std::is_const_v<T>, int>::type = 0>
const T * const_data_ptr() const { return nullptr; }
};
}
int main() {
at::TensorBase tensorBase;
tensorBase.const_data_ptr<double>();
}
the template member function corresponds to exactly your C++ symbol. Replacing std::is_const_v
by __T0::is_const_v
would give your Cuda symbol, and the natural way to end up with this would be headers that redefine parts of std::
in their own namespace, but that's probably not what's going on and a deeper look into nvcc would be required to figure it out. The only references I've found to __T0
in Cuda context were unrelated to this but always involved some compiler specifics - either extra template arguments injected by nvcc or device/host attributes.
Note that this whole deal with including template evaluation in mangled names is quite new, so it's possible that not everything works properly all the time when nvcc is taken into account. For example, with Cuda/NVCC 12.2.140, Clang 18.1.8, I can't reproduce your problem, the code above compiles into your C++ symbol both with nvcc and clang. There must be something more that's nvcc-specific which causes this difference and you might as well file a bug report at NVCC and/or Clang as I don't see a good reason to emit a symbol with 4__T0
no matter what internal magic the compiler's doing.