Search code examples
azureservicebusazure-servicebus-queues

What kind of exceptions give retries in Azure Service Bus SubscriptionClient?


I am struggling with messages ending up in dead letter queue too quickly. I have specified an ExponentialRetry policy like this:

        private readonly RetryExponential _retryPolicy = new RetryExponential(
            TimeSpan.FromSeconds(1),
            TimeSpan.FromMinutes(20),
            10);

When SQL Server is temporarily down ("because its replica role is RESOLVING which does not allow connections. Try the operation again later.") then the messages end up in dead letter queue without retries.

What do I do to get retries?

I looked in the source code for the SDK, and it seems that only transient ServiceBusException give retries, but I find that quite odd.

UPDATE:

After discussing with my coworkers and looking a little bit more in Application Insights, I can see that it actually has retried 10 times, but all within 2 seconds. This is the Max Delivery Count set on the subscription itself - not on the client. This delivery is not affected by any exponential backoff as I want and need.

Max DElivery Count on subscription is 10


Solution

  • I looked in the source code for the SDK, and it seems that only transient ServiceBusException give retries, but I find that quite odd.

    This is correct and follows the design. When an error is a transient error (connectivity issue, throttling, etc.) the client will retry using the RetryPolicy. Otherwise, it's not something that a retry would help, therefore re-trying the same operation would not help.

    To your specific case - the code you have is executed and the message is processed. There's no issue between the client and the broker. That's why the policy is not kicking in. Further on that, you confirm that it's an application issue, given that the message is retried by your process and ends up in the dead-letter queue.

    I am struggling with messages ending up in dead letter queue too quickly.

    What you need is an application retries and backoff to ensure the message is not retried immediately multiple times, causing it to be dead-lettered. This part can be a bit tricker, depending on how long your SQL server is down. There are a few options:

    1. Schedule the message in the future to let SQL Server recover. This option will require to schedule a new message.
    2. Defer the message and handle it later. This will mean you'd need to keep a record of the deferred message SequenceNumbers. One way to implement this option is by scheduling a new message with the sequence number of the original message.
    3. Use an abstraction on top of Service Bus that provides a high-level concept/feature to perform retries, e.g. MassTransit or NServiceBus (Recoverability).