Search code examples
windowsperformanceetwxperf

How to interprete ETW graphs for sampled and precise CPU usage when they contradict


I'm having difficulties pinning down where our application is spending its time.

Looking at the flame graphs of an ETW trace from the sampled and the precise CPU Usage, they contradict each other. Below are the graphs for a 1 second duration

According to the "CPU Usage (Sampled)" graph

  1. vbscript.dll!COleScript::ParseScriptText is a big contributor in the total performance.
  2. ws2_32.dll!recv is a small contributor.

According to the "CPU Usage (Precise)" graph

Essentially, this shows it's the other way around?

  1. vbscript.dll!COleScript::ParseScriptText is a small contributor, only taking up 3.95 ms of CPU.
  2. ws2_32.dll!recv is a big contributor, taking up 915,09 ms of CPU.

What am I missing or misinterpreting?


CPU Usage (Sampled)

CPU Usage (Sampled)


CPU Usage (Precise)

CPU Usage (Precise)


Solution

  • There is a simple explanation:

    CPU Usage (Precise) is based on context switch data and it can therefore give an extremely accurate measure of how much time a thread spends on-CPU - how much time it is running. However because it is based on context switch data it only knows what the call stack is when a thread goes on/off CPU. It has no idea what it is doing in-between. Therefore, CPU Usage (Precise) data knows how much time a thread is using but it has no ideas where that time is spent.

    CPU Usage (Sampled) is less accurate in regards to how much CPU time a thread consumes, but it is quite good (statistically speaking, absent systemic bias) at telling you where time is spent.

    With CPU Usage (Sampled) you still need to be careful about inclusive versus exclusive time (time spent in a function versus spent in its descendants) in order to interpret data correctly but it sounds like that data is what you want.

    For more details see https://randomascii.wordpress.com/2015/09/24/etw-central/ which has documentation of all of the columns for these two tables, as well as case studies of using them both (one to investigate CPU usage scenarios, the other to investigate CPU idle scenarios)