I'm developing a MPI+cuda project and I've tried to profile my app with nvvp and nvprof but, in both cases it doesn't give a profile. The app works completely fine, but no profile is generated.
nvprof mpirun -np 2 MPI_test
[...]
======== Warning: No CUDA application was profiled, exiting
I tried with simpleMPI cuda example with the same result.
I'm using CUDA 5.0 in a 580 GTX and openMPI 1.7.3 (featured, not release yet because I'm testing the CUDA-aware option)
Any ideas? Thank you very much.
mpirun
itself is not a CUDA application. You have to run the profiler like mpirun -np 2 nvprof MPI_test
. But you also have to make sure that each instance of nvprof
(two instances in that case) is writing to a different output file. Open MPI exports the OMPI_COMM_WORLD_RANK
environment variable that gives the process rank in MPI_COMM_WORLD
. This could be used in just another wrapper, e.g. wrap_nvprof
:
#!/bin/bash
nvprof -o profile.$OMPI_COMM_WORLD_RANK $*
This should be run like mpirun -n 2 ./wrap_nvprof executable <arguments>
and after it has finished there should be two output files with profile information: profile.0
for rank 0 and profile.1
for rank 1.
Edit: There is an example nvprof
wrapper script that does the same in a more graceful way and that handles both Open MPI and MVAPICH2 in the nvvp
documentation. A version of the script is reproduced in this answer to a question that yours is more or less a duplicate of.