cuda online-compilation cuda-driver nvtx

`cuModuleLoadDataEx` returns `CUDA_ERROR_UNSUPPORTED_PTX_VERSION`

I am porting a CUDA application from the CUDA runtime to using the CUDA driver API. In that, I am having issues in both understanding and make working the online compilation. I get a CUDA_ERROR_UNSUPPORTED_PTX_VERSION failure when calling cuModuleLoadDataEx(&module, ptx, 0, 0, 0), where ptx is a NULL-terminated char * containing the program generated PTX source. The program is compiled via nvrtcCompileProgram with the --gpu-architecture sm_90 passed, since I am on a cluster with NVIDIA H100 9.0 compute capable. For the same reason, I find it very unlikely the provided toolkit to suffer versioning incompatibility issues (like explained here). I assume the line at which the module loading function is failing is the very first one .version 8.5, but I don't know how to control that. The first header sections are:

.version 8.5
.target sm_90
.address_size 64

Can it be that the 8.5 version is incompatible with real sm_90 architecture?

Solution

tl;dr: The incompatibility is with your NVIDIA driver, not with the GPU.

(answer due to @RobertCrovella's comments)

What CUDA_ERROR_UNSUPPORTED_PTX_VERSION implies is not that your GPU doesn't support the PTX version. GPUs don't actually get PTX at all; the PTX is compiled into SASS (Streaming Assembly) language, with the instruction set used by the GPU, and that's what the GPU does get.

However, each CUDA version - or rather, the NVIDIA driver and its associated library (libcuda.so on Linux) - is only knowledgeable about versions of the PTX standard up to a certain number. In your case:

You have CUDA version 12.2.1 installed.
When you installed it, you also installed the NVIDIA driver version 535.86.10 - that's what CUDA 12.2.1 bundles.
As @RobertCrovella points out, CUDA 12.2 only knows about PTX version 8.2 (and earlier) - while you were trying to load a PTX file created with version 8.5.

... so you got exactly what you were supposed to: The error of "I am not familiar with the PTX version you've shown me". Nothing was said about your GPU.

You should probably update the NVIDIA driver, or CUDA overall (including the driver).