I have a Fortran MPI code instrumented with OpenACC. It is a big code. No way I can provide any meaningful snippets here. It runs fine under Cray aprun:
aprun -n 15 ./mycode
I want to profile it with nvprof. I try:
aprun -n 15 -b nvprof ./mycode
The code again runs OK, but when all is said and done, I get no profiling data, just a message:
======== Warning: No CUDA application was profiled, exiting
There is no other error message provided. Anyone have any idea what would cause this behavior? I am compiling with the Cray MPI Fortran compiler. My compile flags are
-Mdaz -traceback -Ktrap=inv -acc -ta=tesla,cuda6.5,cc35,nofma -Minfo=accel -Mcuda=cuda6.5,cc35 -I. -module .
The cudatoolkit
module is loaded.
aprun -n 15 -b nvprof --profile-child-processes ./mycode
For cray systems, you run aprun from a login node. aprun launches processes on compute nodes. By default, nvprof will not profile the child processes, so the --profile-child-processes
option profiles the spawned processes.