Search code examples
c#azureazure-queues

Azure triggered queue failing to delete


According to the documentation from here and here, numerous SO questions like this one, my understanding is that when a queued message fails the given number of times, in this case 5, it is moved from the current queue and into the poison queue automatically.

Unfortunately my limited experience has found this to be only partially true as when the job does fail the max dequeue count, it is added to the poison queue automatically but it is not removed from the original queue and is then reprocessed an apparently unchangeable, 10 minutes later, adding the same message to the poison queue, creating duplicates, still not removing it.

When I implemented my own IQueueProcessorFactory class, created a custom QueueProcessor while overwriting DeleteMessageAsync, I was able to confirm the method is being called when the exception is thrown 5 times and the method finishes without exceptions but the message in the queue remains. I have also tried deleting both the normal and poison queues.

The code I'm using:

public class Program
{
    private const string QUEUE_NAME = "some-queue";

    // Please set the following connection strings in app.config for this WebJob to run:
    // AzureWebJobsDashboard and AzureWebJobsStorage
    static void Main()
    {
        var config = new JobHostConfiguration();

        config.Queues.QueueProcessorFactory = new CustomFactory();
        var host = new JobHost(config);
        // The following code ensures that the WebJob will be running continuously
        host.RunAndBlock();
    }

    private class CustomFactory : IQueueProcessorFactory
    {
        public QueueProcessor Create(QueueProcessorFactoryContext context)
        {
            return new CustomQueueProcessor(context);
        }

        private class CustomQueueProcessor : QueueProcessor
        {
            public CustomQueueProcessor(QueueProcessorFactoryContext context) : base(context)
            {

            }

            protected override Task DeleteMessageAsync(CloudQueueMessage message, CancellationToken cancellationToken)
            {
                return base.DeleteMessageAsync(message, cancellationToken);
            }
        }
    }

    public static void QueueTrigger([QueueTrigger(QUEUE_NAME)] CloudQueueMessage message)
    {
        Console.WriteLine($"Processing message: {message.AsString}");
        throw new Exception("test exception");
    }
}

Everything works as expected except that the message remains in the original queue. I'm assuming, and hoping, the error is in my end and or that it's something stupid that I simply overlooked because I am new to queue's but after having spent almost 2 days trawling the internet for information, I am officially at a loss as what to do or try next.

Edit

While we did end up going with Service Bus, it is worth noting that we came up with an alternative which was to semi-manage the queue ourselves from within the queue trigger.

What this entailed was to check the dequeue count and if it is above the max dequeue (retry) count, simply return. This would signal to the caller that the message "successfully" processed which then removed it from the queue. The approach would result in almost the expected behavior in that the message would get added to the poison queue while being removed from the normal queue 10 minutes later.

It has the added benefit of continuing to work with future releases of the packages or updates to the queues themselves that would fix the original problem as the if would simply never be true.

public class Program
{
    private const int MAX_DEQUEUE_COUNT = 5;

    static void Main()
    {
        var config = new JobHostConfiguration();
        ...
        config.Queues.MaxDequeueCount = MAX_DEQUEUE_COUNT;
        ...
    }

    public static void QueueTrigger([QueueTrigger("some-queue")] CloudQueueMessage message)
    {
        if (message.DequeueCount > MAX_DEQUEUE_COUNT)
        {
            // prevents the message from indefinitely retrying every 10 minutes and ultimately creating duplicates within the poison queue.
            return;
        }

        // do stuff
    }

Solution

  • I'm about to give you a non-referenced-answer based on personal experience and what I've seen on StackOverflow. You are not the first person to have issues with automatic deadlettering and honoring the max dequeue count in regards to WebJob QueueTriggerAttributes. My recommendation is to sidestep the flakiness of Storage Queues + QueueTriggers in favor of using Service Bus Queues and Service Bus Triggers.

    As a messaging technology, Service Bus Queues are much more full-featured and are cost-comparable. The only real reasons I'd choose to use Storage Queues over Service Bus Queues is if you needed to store more than 80GB of messages, which is the Service Bus Queue limit with partitioning.