Search code examples
c#etw.net-traceprocessing

Thread Wait Times


I did try to reproduce the Thread Wait times from Context Switch events with TraceProcessing. The most simplicistic approach would be to sum up from all threads from all processes the wait times. Normally I want this for a specific thread but just show the issue here is the most simple code:

using ITraceProcessor processor = TraceProcessor.Create(myEtlFile, new TraceProcessorSettings
{
  AllowLostEvents = true,
});

IPendingResult<IContextSwitchDataSource> myContextSwitchData = processor.UseContextSwitchData();
processor.Process();
double WaitDurationInMs;
foreach (IContextSwitch cSwitch in myContextSwitchData.Result.ContextSwitches)
{
    IContextSwitchIn switchin = cSwitch.SwitchIn;

    if (switchin.WaitTime.HasValue)
    {
        WaitDurationInMs += switchin.WaitTime.Value.TotalMilliseconds;
    }
}   

But apparently the switchin.WaitTime value is nowhere near any thread wait time printed by WPA. How can I get for a Thread its

  • Wait Time (Thread was blocked)
  • Ready Time (Thread was waiting to run in ready queue)
  • CPU Time (Thread was running on one CPU)

enter image description here

An example how to do that would be nice. Also the Context Switch event numbers of WPA and TraceProcessor seem to be quite a bit off. I guess I need to know some internals how the events need to be correlated.


Solution

  • I have found the corresponding entry. You need to call on an instance of ITraceProcessor the extension method .UseCpuSchedulingData() and then you get in ThreadActivity the CPU/Ready and Wait data along with call stacks just like in WPA.

    I have prepared a little sample which prints by process id from an etl file the CPU and Wait time summed across all threads:

    C>ContextSwitchParsing.exe test.etl
    PID:  Idle, CPU: 435395.380084 ms Wait:                0 ms
    PID: 18272, CPU: 18389.592761 ms Wait:    133794.380307 ms
    PID:  3892, CPU:  5156.316266 ms Wait:   1029236.568333 ms
    PID: 16876, CPU:  2861.243413 ms Wait:    174568.161760 ms
    PID: 10120, CPU:  2812.669957 ms Wait:     30600.500853 ms
    PID: 24808, CPU:  1771.803843 ms Wait:      7445.196940 ms
    PID:  1316, CPU:  1690.323850 ms Wait:   1064952.353463 ms
    PID:     4, CPU:  1641.261310 ms Wait:   1251175.067399 ms
    PID: 15436, CPU:  1438.144470 ms Wait:    286545.316703 ms
    PID: 32788, CPU:   541.485469 ms Wait:      1566.614792 ms
    

    That is nicely in sync with the output of WPA enter image description here

    Below is the source code. You need to add a reference to the Nuget package: Microsoft.Windows.EventTracing.Processing.All which will disappear and resolve to bunch of dependant nuget packages.

    using Microsoft.Windows.EventTracing;
    using Microsoft.Windows.EventTracing.Cpu;
    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    namespace ContextSwitchParsing
    {
        class Program
        {
            static void Main(string[] args)
            {
                if(args.Length ==0)
                {
                    Console.WriteLine("Enter an ETL file Name to process");
                    return;
                }
    
                string fileName = args[0];
                new Program().Run(fileName);
            }
    
            class CpuWait
            {
                public decimal CPUTimeInMs { get; set; }
                public decimal WaitTimeInMs { get; set; }
            }
    
            enum ProcessId
            {
                Idle =0,
            }
    
            private void Run(string fileName)
            {
                using ITraceProcessor processor = TraceProcessor.Create(fileName, new TraceProcessorSettings
                {
                    AllowLostEvents = true
                });
    
                IPendingResult<ICpuSchedulingDataSource> schedulingResult = processor.UseCpuSchedulingData();
                processor.Process();
    
                Dictionary<ProcessId, CpuWait> cpuAndWaitPerProcess = new Dictionary<ProcessId, CpuWait>();
    
                foreach(ICpuThreadActivity activity in schedulingResult.Result.ThreadActivity)
                {
                    if (activity.Process == null)
                    {
                        continue;
                    }
    
                    ProcessId pid = (ProcessId)activity.Process.Id;
    
                    if ( !cpuAndWaitPerProcess.TryGetValue(pid, out CpuWait cpuAndWait) )
                    {
                        cpuAndWait = new CpuWait();
                        cpuAndWaitPerProcess.Add(pid, cpuAndWait);
                    }
    
                    cpuAndWait.CPUTimeInMs += activity.Duration.Duration.TotalMilliseconds;
    
                    if( activity.WaitingDuration.HasValue )
                    {
                        cpuAndWait.WaitTimeInMs += activity.WaitingDuration.Value.TotalMilliseconds;
                    }
                }
    
    
                KeyValuePair<ProcessId,CpuWait>[] byCpuDescending = cpuAndWaitPerProcess.OrderByDescending(x => x.Value.CPUTimeInMs).ToArray();
    
                foreach(var top in byCpuDescending.Take(10))
                {
                    Console.WriteLine($"PID: {top.Key,5}, CPU: {top.Value.CPUTimeInMs,12} ms Wait: {top.Value.WaitTimeInMs,16} ms");
                }
            }
        }
    }
    

    This works perfectly and it parses things very efficiently. Behind the scenes it uses the ContextSwitch data source but you need to know quite a bit of internals to understand it.

    A great trick of this library is that it groups all Context Switch events by processor because a Context Switch happens always on one processor so you can process all context switch events sorted by time grouped by CPU. Internally it uses a clever combination of MinHeap which is declared as

    MinHeap<IEnumerator, CSwitchIEnumeratorByTimeComparer>

    which contains the groupings.

    Once a new processor is reached the Minheap enumerates all events and creates a tree of data which is then consumed sorted by time until no events are there and then the next processor grouping es "expanded".

    Besides that I really like the clean design of time stamps which convert directly to Milli, Micro, Nanoseconds decimals or DateTimeOffset values depending on what you are after. Also the clear definition of a size type which declares everything what you could need in a strong type makes it crystal clear what values in which units you get:

        public readonly struct DataSize : IEquatable<DataSize>, IComparable<DataSize>, IComparable
        {
            public static readonly DataSize Zero;
            public static readonly DataSize MinValue;
            public static readonly DataSize MaxValue;
    
            public DataSize(long bytes);
    
            public decimal TotalMebibytes { get; }
            public decimal TotalKibibytes { get; }
            public decimal TotalTerabytes { get; }
            public decimal TotalGigabytes { get; }
            public decimal TotalMegabytes { get; }
            public decimal TotalKilobytes { get; }
            public long Bytes { get; }
            public decimal TotalGibibytes { get; }
            public decimal TotalTebibytes { get; }
    
            public static DataSize FromBytes(decimal bytes);
            public static DataSize FromBytes(long bytes);
            public static DataSize FromGibibytes(long gibibytes);
            public static DataSize FromGibibytes(decimal gibibytes);
            public static DataSize FromGigabytes(decimal gigabytes);
            public static DataSize FromGigabytes(long gigabytes);
            public static DataSize FromKibibytes(long kibibytes);
            public static DataSize FromKibibytes(decimal kibibytes);
            public static DataSize FromKilobytes(long kilobytes);
            public static DataSize FromKilobytes(decimal kilobytes);
            public static DataSize FromMebibytes(long mebibytes);
            public static DataSize FromMebibytes(decimal mebibytes);
            public static DataSize FromMegabytes(long megabytes);
            public static DataSize FromMegabytes(decimal megabytes);
            public static DataSize FromTebibytes(long tebibytes);
            public static DataSize FromTebibytes(decimal tebibytes);
            public static DataSize FromTerabytes(long terabytes);
            public static DataSize FromTerabytes(decimal terabytes);
            public static DataSize Max(DataSize first, DataSize second);
            public static DataSize Min(DataSize first, DataSize second);
            public int CompareTo(object other);
            public int CompareTo(DataSize other);
            public override bool Equals(object other);
            public bool Equals(DataSize other);
            public override int GetHashCode();
            public string ToString(bool useBinaryPrefixes);
            public override string ToString();
    
            public static DataSize operator +(DataSize left, DataSize right);
            public static DataSize operator -(DataSize left, DataSize right);
            public static bool operator ==(DataSize left, DataSize right);
            public static bool operator !=(DataSize left, DataSize right);
            public static bool operator <(DataSize left, DataSize right);
            public static bool operator >(DataSize left, DataSize right);
            public static bool operator <=(DataSize left, DataSize right);
            public static bool operator >=(DataSize left, DataSize right);
        }
    

    So far this is one of the ETW Libraries with a textbook design how you should implement a clear and easy to use API while keeping great performance.