Search code examples
c#sql-serverservice-broker

SQL Server Service Broker External Activator Checkpointing Error


We have been using Microsoft's Service Broker External Activator Service for some time now to handle external activation of Service Broker queues without issue. But over the last week an error keeps occurring that I can't get to the bottom of.

At least once per day, at random times, the service will encounter an error and become stuck in the Stopping state. At this point all that can be done is to kill the process and restart the service. Checking the EATrace.log file reveals the following error:

18/07/2018 09:59:45 EXCEPTION
ERROR = 90, Internal exceptions have occurred when External Activator is runtime checkpointing.
18/07/2018 09:59:45 EXCEPTIONDETAILS Inner Exception:
18/07/2018 09:59:45 EXCEPTIONDETAILS System.IO.IOException: Cannot create a file when that file already exists.
18/07/2018 09:59:45 EXCEPTIONDETAILS 
18/07/2018 09:59:45 EXCEPTIONDETAILS    at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
18/07/2018 09:59:45 EXCEPTIONDETAILS    at System.IO.File.Move(String sourceFileName, String destFileName)
18/07/2018 09:59:45 EXCEPTIONDETAILS    at ExternalActivator.LogManager.SaveRecoveryContext(LogRecoveryContext recoveryContext)
18/07/2018 09:59:45 EXCEPTIONDETAILS    at ExternalActivator.LogManager.Checkpoint(LogRecoveryContext recoveryContext)
18/07/2018 09:59:45 EXCEPTIONDETAILS    at ExternalActivator.LogManager.Log(LogRecord recoveryLogRec)
18/07/2018 09:59:45 EXCEPTIONDETAILS    at ExternalActivator.ApplicationMonitor.OnProcessExited(ProcessMonitor processMonitor)
18/07/2018 09:59:45 EXCEPTIONDETAILS    at ExternalActivator.ProcessMonitor.NotifySubscriber()
18/07/2018 09:59:45 EXCEPTIONDETAILS    at ExternalActivator.ProcessMonitor.OnProcessExited(Object a, EventArgs b)

Using Reflector I have found the offending SaveRecoveryContext method:

private void SaveRecoveryContext(LogRecoveryContext recoveryContext)
    {
      this.m_file = File.Open(this.TempLogFileName, FileMode.Create, FileAccess.Write, FileShare.Read);
      foreach (LogRecord recoveryLogRec in recoveryContext.List)
        this.Write(recoveryLogRec);
      this.CloseFiles();
      File.Delete(this.LogFileName);
      File.Move(this.TempLogFileName, this.LogFileName);
      this.m_file = File.Open(this.LogFileName, FileMode.Append, FileAccess.Write, FileShare.Read);
    }

Note that LogFileName is EARecovery.rlog and TempLogFileName is EARecovery_temp.rlog. When checking the log folder after the error occurs there is only the temp file and the original has been deleted as expected.

My first thought was that perhaps multiple threads were attempting to checkpoint at the same time and were tripping each other up but going up the stack trace takes us to the following:

public void Log(LogRecord recoveryLogRec)
    {
      lock (this)
      {
        this.Write(recoveryLogRec);
        if (!recoveryLogRec.CanCompress)
          return;
        ++this.m_recordsToCompress;
        if (this.m_recordsToCompress <= 100)
          return;
        LogRecoveryContext local_0 = new LogRecoveryContext();
        string local_1 = Global.GetEaContext();
        Global.SetEaContext(Localized.GL_EaContext_RuntimeCheckpoint);
        this.Checkpoint(local_0);
        Global.SetEaContext(local_1);
      }
    }

I would have expected the lock statement to protect against this.

Nothing has changed on the server aside from the usual Windows Udpates and load has not increased, but for some reason this error has just started occurring since around the 16th July. Turning on Verbose logging I can see that it is checkpointing far more that I would anticipate, and when the error occurs it is always within a second or two of a previous checkpoint.

Any help or pointers on where to look next would be greatly appreciated as I'm tearing my hair out over this one.


Solution

  • This isn't specific to Service broker, but I have seen this happen before, especially with the code:

      File.Delete(this.LogFileName);
      File.Move(this.TempLogFileName, this.LogFileName);
    

    If a background process (like antivirus software) is holding the file open, the file may not disappear right away. If you are running antivirus software, like Windows Defender or some other product, you may need to white list the folder so that it doesn't try and scan the files.

    The only other suggestion I can make, again unrelated, is to clear out the Windows temp folder. I ran into something some years ago, where creating a number of temp files caused issues with Windows, based on how they either name or track them internally. I don't have a link for you, and I don't think that's your issue here, but something you may want to try.