Search code examples
xmlcudaprofilingnvvptext-decoding

How can I obtain timing values from the output of nvprof or of NVidia Visual Profiler?


I'm using nvprof to profile something (which includes both CPU work and GPU work, i.e. I use nvprof markers etc.), and I get binary files which nvprof produces. I can import these into NVVP (NVidia Visual Profiler; Linux version), and with a little effort also save that out to an XML.

However... the XML does not contain timing data about what my various CPU do when. It mentions their existence, but no more. Also, the end of the XML has this binary blob, probably Base64-encoded or something, inside a PDM tag. It's not clear to me whether there's any help there.


Solution

  • It is quite an old question, but maybe somebody will find the answer useful.

    nvprof output files are actually SQLite3 databases, which you can open either with standalone sqlite3 program or programmatically. The timeline information is inside these tables (all timestamps are in nanoseconds):

    • CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL - data about kernels
    • CUPTI_ACTIVITY_KIND_MEMCPY - data about memory copies (non-P2P)
    • CUPTI_ACTIVITY_KIND_MEMCPY2 - data about P2P memory copies
    • CUPTI_ACTIVITY_KIND_MEMSET - data about memsets
    • CUPTI_ACTIVITY_KIND_RUNTIME - data about CUDA Runtime API calls
    • CUPTI_ACTIVITY_KIND_DRIVER - data about CUDA Driver API calls
    • CUPTI_ACTIVITY_KIND_MARKER - data about NVTX markers. It has a little different form than the other tables, because it does not have start and end fields. Instead, start and end of a marker are 2 entries (end has name=0)

    You can correlate API calls with kernels/memcopies/memsets using correlationId field.