I'm using nvprof to profile something (which includes both CPU work and GPU work, i.e. I use nvprof markers etc.), and I get binary files which nvprof produces. I can import these into NVVP (NVidia Visual Profiler; Linux version), and with a little effort also save that out to an XML.
However... the XML does not contain timing data about what my various CPU do when. It mentions their existence, but no more. Also, the end of the XML has this binary blob, probably Base64-encoded or something, inside a PDM tag. It's not clear to me whether there's any help there.
It is quite an old question, but maybe somebody will find the answer useful.
nvprof
output files are actually SQLite3 databases, which you can open either with standalone sqlite3 program or programmatically. The timeline information is inside these tables (all timestamps are in nanoseconds):
CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL
- data about kernelsCUPTI_ACTIVITY_KIND_MEMCPY
- data about memory copies (non-P2P)CUPTI_ACTIVITY_KIND_MEMCPY2
- data about P2P memory copiesCUPTI_ACTIVITY_KIND_MEMSET
- data about memsetsCUPTI_ACTIVITY_KIND_RUNTIME
- data about CUDA Runtime API callsCUPTI_ACTIVITY_KIND_DRIVER
- data about CUDA Driver API callsCUPTI_ACTIVITY_KIND_MARKER
- data about NVTX markers. It has a little different form than the other tables, because it does not have start
and end
fields. Instead, start and end of a marker are 2 entries (end has name=0
)You can correlate API calls with kernels/memcopies/memsets using correlationId
field.