Just to see what kind of code CUDA is generating I like to compile to ptx in addition to an object file. Since some of my loop unrolling can take quite a while I'd like to be able to compile *.cu
→*.ptx
→*.o
instead of wasting time with both *.cu
→*.ptx
and *.cu
→*.o
, which I'm currently doing.
Simply adding -ptx
to the nvcc *.cu
line gives the desired ptx output.
Using ptxas -c
to compile *.ptx
to *.o
works, but causes an error in my executable linking: Relocations in generic ELF (EM: 190)
.
Attempting to compile the *.ptx
with nvcc
fails silently, outputting nothing.
Is there some option I need to pass to ptxas
? How should I properly compile via ptx with separate compilation? Alternatively, can I just tell nvcc
to keep the ptx?
Alternatively, can I just tell nvcc to keep the ptx?
Yes, you can tell nvcc to keep all intermediate files, one of which will be the .ptx
file.
nvcc -keep ...
Keeping all the intermediate files is a bit messy, but I'm sure you can come up with a script to tidy things up, and only save the files you want.