I'm trying to implement an azure function with a service bus trigger that would re-try the processing of failed messages with a delay.
What appears to be happening and what I see in the web jobs code is that if a function throws an exception the message is abandoned by azure web jobs. So it's re-processed immediately without a delay.
Using the FixedDelayRetryAttribute
on the function to set up a re-try policy is partially working but there are 3 issues with it:
There IS an option to handle message completion manually with AutoCompleteMessages=false
and ServiceBusMessageActions
but it requires to complete, dealletter or defer the message for the framework to not abandon it (based on the web jobs code and my testing).
In my case to delay the re-try I want to either leave the message locked until it expires or re-new the lock.
Renewing the lock with ServiceBusMessageActions.RenewMessageLockAsync()
or doing nothing does not exclude it from the abandon operation so it's still abandoned.
The best approach I've found so far is to complete the original message and create a new one with a scheduled delivery date in order to delay it. I suspect this is what I will have to live with (this is what Microsoft recommends at https://github.com/Azure/azure-service-bus/issues/454) but I'm still hoping for a more elegant solution.
[FunctionName("MyFunc")]
public async Task MyFunc(
[ServiceBusTrigger("MyQueue", AutoCompleteMessages = false)] ServiceBusReceivedMessage message,
[ServiceBus("MyQueue")] IAsyncCollector<ServiceBusMessage> collector,
ServiceBusMessageActions messageActions,
CancellationToken token)
{
try
{
//do work...
}
catch
{
//re-try attempt counting omitted
await messageActions.CompleteMessageAsync(message, CancellationToken.None);
await collector.AddAsync(new ServiceBusMessage()
{ ScheduledEnqueueTime = DateTimeOffset.UtcNow.AddMinutes(1) }, CancellationToken.None);
await collector.FlushAsync(CancellationToken.None);
throw;
}
await messageActions.CompleteMessageAsync(message, CancellationToken.None);
}
You've done most of the research and the solution mentioned in the GitHub issue is the way to go.
To add a little more context:
Failed messages are retried by default immediately since the assumption is that there was a transient error or crashed worker
The attributes in Azure Functions are in-code retries handled by the Functions Runtime, which is why you only see a single message delivery and does not survive host restarts. This is by design.
The reason for the default is the "First In First Out" nature of queues, and if your scenario does not require this guarantee, the best approach would be to deadletter failures and have a separate function re-schedule the messages back in to the worker queue.