Search code examples
visual-studioprofiler

Finding CPU usage in the XML files exported by Visual Studio 2017's profiler from a VSPX file?


I need to parse the XML files generated by Visual Studio's performance profiler, and get information about CPU usage of each function in my program (similar to what Visual Studio displays in the diagnostics tools and performance profiler).

I tried looking through all the XML files that I can generate using the tool (call tree summary, function summary, process summary, etc.) but I don't seem to find the information regarding CPU usage.

Which XML views should I export to get this information? Also, where can I find CPU usage? What are InclSamples and ExclSamples? Here is an example from my function summary XML:

<Function FunctionName="MatrixMultiply.Program.FunctionName" InclSamples="1,444" ExclSamples="881" InclSamplesPercent="97.30" ExclSamplesPercent="59.37" />

Solution

  • For generating the XML files correctly, you have to choose a target and checkbox the analyze CPU Usage in the Performance Profiler (Alt + F2) which is available in Debug > Performance Profiler from the menu dropdown. This window is the diagsession window which asks you to choose the performance reports you need and start the profiling.

    - CPU Usage
    - Memory Usage
    - GPU Usage
    

    Once you export the report into a VSPX file, VS2017 allows you to export this binary file into readable CSVs/XML files. In the menu that shows up, you could choose different reports you'd like. Some of the reports that the profiler can generate for you are:

    • Caller Callee Summary
    • Call Tree Summary
    • Function Summary
    • Header Summary
    • IP Summary
    • Line Summary
    • Marks Summary
    • Module Summary
    • Process Summary
    • Process Thread Summary
    • Thread Summary

    It looks like you're most interested in the CallTreeSummary, the exported XML file will have the name <reportname>_CallTreeSummary.xml which contains the <PerformanceReport> consisting of <CallTreeSummary>

    Each CallTree call in the CallTreeSummary contains the functionName, the InclSamples, ExclSamples and their respective percentages. Here's an example:

    For a sample C++ code as follows (Memory leak sample):

    int main() {
        while (true) {
            int *p = new int;
        }
        return 0;
    }
    

    A part of the CallTree shows up as follows:

    <CallTree Level="8" FunctionName="operator new" InclSamples="20,560" ExclSamples="78" InclSamplesPercent="97.46" ExclSamplesPercent="0.37" ModuleName="Sample.exe" />
    <CallTree Level="9" FunctionName="[ucrtbased.dll]" InclSamples="20,482" ExclSamples="55" InclSamplesPercent="97.09" ExclSamplesPercent="0.26" ModuleName="ucrtbased.dll" />
    <CallTree Level="10" FunctionName="[ucrtbased.dll]" InclSamples="20,427" ExclSamples="83" InclSamplesPercent="96.83" ExclSamplesPercent="0.39" ModuleName="ucrtbased.dll" />
    <CallTree Level="11" FunctionName="[ucrtbased.dll]" InclSamples="20,344" ExclSamples="177" InclSamplesPercent="96.44" ExclSamplesPercent="0.84" ModuleName="ucrtbased.dll" />
    <CallTree Level="12" FunctionName="[ucrtbased.dll]" InclSamples="20,119" ExclSamples="2,092" InclSamplesPercent="95.37" ExclSamplesPercent="9.92" ModuleName="ucrtbased.dll" />
    

    InclSamples represents the total number of ticks taken to execute the function and any related functions that are called. ExclSamples represents the total number of ticks taken to execute only the function.

    For an illustrative example, consider the following example:

    int bar() {
        return 1;
    }
    
    int foo() {
        return bar();
    }
    
    int main() {
        int x = foo();
        return 0;
    }
    

    A sample execution could show the following data:

    <FunctionName="main" InclSamples="100" ExclSamples="10" InclSamplesPercent="100.00" .../>
    <FunctionName="foo" InclSamples="90" ExclSamples="40" InclSamplesPercent="90.00".../>
    <FunctionName="bar" InclSamples="50" ExclSamples="50" InclSamplesPercent="50.00" .../>
    

    This is interpreted as follows:

    • Running the main() function takes 100 CPU ticks in total but only 10 of those ticks resulted in executing this function, leaving us with 90 ticks which were used in other functions being called from main()
    • Running the foo() function takes 90 CPU ticks (since it includes the run time of foo() + bar() but exclusively the foo() takes 40 ticks.
    • The bar() function takes 50 ticks to run.

    Using the reasoning above, you could reason the CallTree sample provided above. The associated InclSamplesPercent is the percentage of time taken by the CPU to run the overall task. For e.g. from the sample above we can say that 100% of the CPU usage was by the main() function but 90% of that was taken up by foo() function and 50% by bar() function effectively making the ExclSamplesPercent by foo() as 90 - 50 = 40.00% and ExclSamplesPercent by main() as 100 - 90 = 10%