Search code examples
.netazure-functionsazureservicebus

Azure functions service bus trigger and delayed configurable persistent re-tries


I'm trying to implement an azure function with a service bus trigger that would re-try the processing of failed messages with a delay.
What appears to be happening and what I see in the web jobs code is that if a function throws an exception the message is abandoned by azure web jobs. So it's re-processed immediately without a delay.

Using the FixedDelayRetryAttribute on the function to set up a re-try policy is partially working but there are 3 issues with it:

  1. It is not configurable, one has to hardcode the number of re-tries and the delay
  2. It does not survive host restarts, so if the function app is stopped or scaled in and the instance the function is running on is stopped it doesn't remember to continue re-trying after the restart
  3. The re-try policy re-tries don't count as message deliveries which makes things confusing (e.g. if it re-tries 3 times and fails it only counts as one ASB message delivery)

There IS an option to handle message completion manually with AutoCompleteMessages=false and ServiceBusMessageActions but it requires to complete, dealletter or defer the message for the framework to not abandon it (based on the web jobs code and my testing).
In my case to delay the re-try I want to either leave the message locked until it expires or re-new the lock. Renewing the lock with ServiceBusMessageActions.RenewMessageLockAsync() or doing nothing does not exclude it from the abandon operation so it's still abandoned.

The best approach I've found so far is to complete the original message and create a new one with a scheduled delivery date in order to delay it. I suspect this is what I will have to live with (this is what Microsoft recommends at https://github.com/Azure/azure-service-bus/issues/454) but I'm still hoping for a more elegant solution.

[FunctionName("MyFunc")]
public async Task MyFunc(
    [ServiceBusTrigger("MyQueue",  AutoCompleteMessages = false)] ServiceBusReceivedMessage message,
    [ServiceBus("MyQueue")] IAsyncCollector<ServiceBusMessage> collector,
    ServiceBusMessageActions messageActions,
    CancellationToken token)
{
    try
    {
        //do work...
    }
    catch
    {
        //re-try attempt counting omitted
        await messageActions.CompleteMessageAsync(message, CancellationToken.None);
        await collector.AddAsync(new ServiceBusMessage()
        { ScheduledEnqueueTime = DateTimeOffset.UtcNow.AddMinutes(1) }, CancellationToken.None);
        await collector.FlushAsync(CancellationToken.None);
        throw;
    }

    await messageActions.CompleteMessageAsync(message, CancellationToken.None);  
}

Solution

  • You've done most of the research and the solution mentioned in the GitHub issue is the way to go.

    To add a little more context:

    1. Failed messages are retried by default immediately since the assumption is that there was a transient error or crashed worker

    2. The attributes in Azure Functions are in-code retries handled by the Functions Runtime, which is why you only see a single message delivery and does not survive host restarts. This is by design.

    The reason for the default is the "First In First Out" nature of queues, and if your scenario does not require this guarantee, the best approach would be to deadletter failures and have a separate function re-schedule the messages back in to the worker queue.