Search code examples
c#asp.net-coreazure-service-fabricetwevent-flow

'Insufficient system resources' when I listen ETW events with EventFlow on ServiceFabric cluster


I have an ETW listener using EventFlow running on Service Fabric.

This is my configuration file (eventFlowConfig.json):

{
  "inputs": [
    {
      "type": "ETW",
      "sessionNamePrefix": "MyListenerService",
      "cleanupOldSessions": true,
      "reuseExistingSession": true,
      "providers": [
        {
          "providerName": "Provider0"
        }
      ]
    }
  ],
  "filters": [],
  "outputs": [
    {
      "type": "CustomOutput"
    }
  ],
  "schemaVersion": "2018-04-04",

  "extensions": [
    {
      "category": "outputFactory",
      "type": "CustomOutput",
      "qualifiedTypeName": "MyNamespace.EventFlow.Outputs.CustomOutputFactory, MyAssembly"
    }
  ]
}

And this is my entry point:

private static void Main()
{
    try
    {
        string configurationFileName = "eventFlowConfig.json";

        using (var diagnosticsPipeline = ServiceFabricDiagnosticPipelineFactory.CreatePipeline("MyService", configurationFileName))
        {
            ServiceRuntime.RegisterServiceAsync("MyServiceType",
                context => new Service(context)).GetAwaiter().GetResult();

            ServiceEventSource.Current.ServiceTypeRegistered(Process.GetCurrentProcess().Id, typeof(Service).Name);
            // Prevents this host process from terminating so services keeps running. 
            Thread.Sleep(Timeout.Infinite);
        }
    }
    catch (Exception e)
    {
        ServiceEventSource.Current.ServiceHostInitializationFailed(e.ToString());
        throw;
    }
}

When I start/stop my service several times in my local cluster while I am debugging I get this exception:

System.Runtime.InteropServices.COMException: 'Insufficient system resources exist to complete the requested service. (Exception from HRESULT: 0x800705AA)'

I can not restart the service until I restart the computer. The problem is that I am having the same exception in other environments than local.

I've try this: TraceEventSession usage in ServiceFabric application raises insufficient resource error: My service is stateless and is only one instance by node.

Should not this configuration be enough to free/reuse ETW sessions?

"sessionNamePrefix": "MyListenerService",
"cleanupOldSessions": true,
"reuseExistingSession": true,

Has anyone else encountered this problem?

Edit After the answer of @Diego Mendes I've got this executing logman -ets

...
EventFlow-EtwInput-a8aefb3c-594f-4ac7-b9d8-6da1791fb122 Trace                         Running
EventFlow-EtwInput-fe5f58e6-d1a7-4198-95b2-d343584cf46b Trace                         Running
EventFlow-EtwInput-33f67287-5563-4835-b3a1-5527e4fc5e5e Trace                         Running
EventFlow-EtwInput-959eef04-a5ae-47eb-9b7e-057a9fd3fb28 Trace                         Running
EventFlow-EtwInput-0095f186-d657-4974-a613-213d7eb49def Trace                         Running
EventFlow-EtwInput-8fbc52f5-2de6-4826-bce2-36d8abf0c264 Trace                         Running
EventFlow-EtwInput-8e654b40-c299-48f4-818e-5ebe3c2341a4 Trace                         Running
EventFlow-EtwInput-7ec63ec9-428b-4658-b059-698b5ae66986 Trace                         Running

EventFlow is ignoring my sessionNamePrefix and is overwriting with EventFlow-EtwInput? Could be a bug of EventFlow?

I will try to use EventFlow-EtwInput as my sessionNamePrefix.


Solution

  • As you pointed out, it is happening because you are starting and stopping your service multiple times. Each time you start your service, a new session is created, when you do it on Debug mode, the debugger kill the process before it closes the active sessions.

    From Matt answer you linked:

    Windows has a limit of 64 ETW sessions that can be running concurrently. Consider using a single stateless app running on every node to create a single session.

    You can check when it happens again, if there are any sessions left open by running this command:

    logman -ets

    It will list all active sessions, yours is likely being displayed as something like this:

    MyListenerService-A402EE30-53B7-48E4-B602-76B101C0AB97

    if you have multiple sessions active, is because it isn't closing properly, and also not reusing the old session.

    In the configuration, when you set:

    cleanupOldSessions: If set to TRUE, existing ETW trace sessions matching the sessionNamePrefix will be closed. This helps to collect leftover session instances, as there is a limit on their number.

    reuseExistingSession: If turned on, then an existing trace session matching the sessionNamePrefix will be re-used. If cleanupOldSessions is also turned on, then it will leave one session open for re-use.

    From your settings, you are using both ON, I would try tweaking these values to see if will solve the problem.