I have a small CUDA program that I want to profile with nvprof
. The problem is that I want to write the program in such a way that
nvprof my_prog
, it will invoke cudaProfilerStart
and cudaProfilerStop
.my_prog
, it will not invoke any of the above APIs, and therefore can get rid of profiling overhead.The problem hence becomes how to make my code aware of the presence of nvprof
when it runs, without additional command line argument.
Have you measured and verified that cudaProfilerStart/Stop calls introduce measurable overheads when nvprof is not attached? I highly doubt that this is the case.
If this is a problem, you can use #ifdef
directives to exclude these calls from your release builds.
There is no way of detecting whether nvprof is running, since that kind of defeats the purpose of profiling - if the profiled application "senses" the profiler and changes its behavior.