I did a trace of application
In this report file:
1.
When I select "CUDA -> CUDA Summary" in the drop down
Under the Runtime API calls item in the table
% Time - 80.66
Launches
% Device Time - 15.46
All the other time percentages are nearly 0%
so my question here is that where is the rest of the 19.34% of Time and 84.54% of Device Time? That is, if they mean percentage to completely different 'Total Time' values?
2.
I used thrust vectors to copy back and forth my data. In the "Memory Copy" section of this report, all the % Time values for memo copy for my run are apparently negligible.
But guess what, when I click the 'summary' link of the Runtime API Calls (which has its % Time value as high as 80.66), I immediately see that the culprit - 'cudaMemcpy' with its 'Capture Time %' value as high as 73.75 in this 'Runtime API Calls Summary' page.
so my question here is that
CUDA SUMMARY
In the CUDA Summary the % Time under Runtime API Calls is the % of CPU time that is taken by the CUDA Runtime. I do not recall if the % is limited to 100% (all CPU threads are flattened) or if the maximum % is NumCpuCores * 100%.
API CALLS
In order to find the most expensive Runtime API Calls perform the following steps:
It is possible capture the call stack for CUDA Runtime API Calls so you can jump to the source code from the report. This can be enabled in the Activity with the following steps:
WARNING: Setting Call Stack Trace to Always increases the API call overhead. Only enable this when the program is CPU limited and you are trying to identify the source code generating the API calls.
The call stack trace can be accessed from report page that references the API call by using the correlation pane in the bottom left corner of the report page. The screen shot below shows the call stack for the cudaEventSynchronize call in the CUDA Runtime API Calls report page.
It is possible to query for the longest API calls in the Timeline report page using the correlation information for the Process\Thread\Function Calls or Process\CUDA\CUDA Context\Runtime API rows.
The call stack can also be retrieved at this point using the correlation pane.