Search code examples
c#asp.net.netiisowin

IIS App Pool crashes about 20 seconds after an OperationCancelledException is logged by middleware


On our production server, intermittently throughout the day, we're seeing this get logged:

A process serving application pool 'MyApi' suffered a fatal communication error with the Windows Process Activation Service. The process id was '29568'. The data field contains the error number.

I've seen suggestions to fix this like setting some 32-bit thing in IIS, but I have no control over IIS at the production level. Also, my company runs dozens of app pools and only the one I'm investigating has this problem. So I have ruled out some kind of IIS configuration issue.

I have run the crash dump through DebugDiag and this is what gets reported:

In w3wp.exe.18080.dmp the assembly instruction at iiscore!W3_CONTEXT_BASE::GetIsLastNotification+62 in C:\Windows\System32\inetsrv\iiscore.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to read from memory location 0xd7f76008 on thread 94

I've tried googling GetIsLastNotification and found this in the MS docs:

Do not use PreSendRequestHeaders with managed modules that implement IHttpModule. Setting these properties can cause issues with asynchronous requests. The combination of Application Requested Routing (ARR) and websockets might lead to access violation exceptions that can cause w3wp to crash. For example, iiscore!W3_CONTEXT_BASE::GetIsLastNotification+68 in iiscore.dll has caused an access violation exception (0xC0000005).

It says not to use PreSendRequestHeaders with modules that implement IHttpModule. I have verified that no code the entire application does this. I also verified that no code in all our company libraries do this, either.


Here is something very interesting and unusual. About 20 seconds before every app pool crash, I see this getting logged: System.OperationCanceledException: The operation was canceled.

I have determined that this error is coming from some Owin middleware. In the application's Startup.cs file, we register a class that does some logging. It logs this OperationCanceledException because of this code:

// This class inherits from OwinMiddleware

public override async Task Invoke(IOwinContext context)
{
  try
  {
    await this.Next.Invoke(context);
  }
  catch (Exception ex)
  {
    // log stuff
  }
}

All that happens here is when a http request is canceled, await.this.Next.Invoke(context) throws the exception because that's what it's supposed to do. This doesn't seem like a big deal, but the question boils down to this: How does a request being canceled lead to an app pool crash ~20 seconds later?


Solution

  • After tremendous effort, the answer to this was found. The "20 seconds" thing referenced in the title ended up being a red herring cause by some delayed logging. But below is the cause of the app pool crashes.

    In the application's Startup.cs file, we were registering some Owin middleware. The middleware looked something like this:

    public override async Task Invoke(IOwinContext context)
    {      
      try
      {
        await Next.Invoke(context);
      }
      catch (Exception ex)
      {
        // log the error and return a 500 response
        await LogAndRespond(context, ex);
      }
    }
    

    The problem with this is that when a request is canceled by a client invoking this api, Next.Invoke(context) throws an OperationCanceledException. In the catch block, we logged this error but more importantly, we returned a response to the canceled request.

    I don't fully understand why this would cause the entire app pool to crash. I guess the Middleware's attempt to respond to a closed connection caused the memory access violation. Regardless, the solution was to NOT send a response. The final code ended up looking like this.

    public override async Task Invoke(IOwinContext context)
    {
      try
      {
        await Next.Invoke(context);
      }
      catch (OperationCanceledException)
      {
        // Log the canceled request as info, but do NOT send it a response
        _logger.LogInformation("Request has been canceled: {Url}", context.Request.Uri);
      }
      catch (Exception ex)
      {
        // log the error and return a 500 response
        await LogAndRespond(context, ex);
      }
    }