Search code examples
azureazure-functionsazure-storage-queues

Azure function visibilityTimeout


When i read the documentation about visibilityTimeout: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#host-json it says "The time interval between retries when processing of a message fails.". How I understands this is that if the timeout is set to 30 seconds and my function runs for 1 minute but doesn't fail in that 1 minute period, the message doesn't get visible to others in the queue. But when I read up on it by others sources (stackoverflow fx) it tells me the opposite, that when the execution time of the function exceeds the timeout, the message becomes visible EVEN though the function is still processing the message.

What is the truth? Is the timeout only relevant when the function isn't running more (and maybe have failed) or can it happen that the message gets visible again even though the function is still running?

What doesn't makes sense either, if we assume that the message gets visible when timeout is reached, is that the default timeout is 00:00:00 which implies that the message is visible at the same moment it is dequeued. This contradicts what 3. party sources is saying.

I am a bit confused by this.


Solution

  • It appears there are actually two different visibility timeout values used here. Both are set by the Azure WebJobs SDK but only one is configurable.

    When the function fails

    The queues.visibilityTimeout configuration option would be more aptly named retryDelay.

    When the function throws an exception or fails for some other kind of error, the message gets returned to the queue to be retried. The message is returned with the configured visibilityTimeout (see here), which delays when the function will next attempt to run.

    This allows your application to cope with transient errors. For example, if an email API or other external service is temporarily down. By delaying the retry, there is a chance that the service may be back online for the next function attempt.

    Retry is limited to maxDequeueCount attempts (5 default) before a message is moved to the Poison Queue.

    While the function is running

    When the QueueTrigger binding runs the function, it dequeues the message with a visibility timeout of 10mins (hard-coded here). It then sets a timer to extend the visibility window when it reaches half-time as long as the function is running (see the timer and visibility update in the source).

    Ordinarily you don't need to worry about this as long as your functions use CancellationTokens correctly. This 10min timeout only matters if the Azure Function/WebJob host doesn't get to shut down gracefully. For example:

    So, as long as the function is still running, the message will remain hidden from the queue.

    Verification

    I did a similar experiment to check:

    [FunctionName("SlowJob")]
    public async Task Run(
        [QueueTrigger("slow-job-queue")] CloudQueueMessage message,
        ILogger log)
    {
        for (var i = 0; i < 20; i++)
        {
            log.LogInformation($"Next visible {i}: {message.NextVisibleTime}");
            await Task.Delay(60000);
        }
    }
    

    Output:

    Next visible 0: 5/11/2020 7:49:24 +00:00
    Next visible 1: 5/11/2020 7:49:24 +00:00
    Next visible 2: 5/11/2020 7:49:24 +00:00
    Next visible 3: 5/11/2020 7:49:24 +00:00
    Next visible 4: 5/11/2020 7:49:24 +00:00
    Next visible 5: 5/11/2020 7:54:24 +00:00
    Next visible 6: 5/11/2020 7:54:24 +00:00
    Next visible 7: 5/11/2020 7:54:24 +00:00
    Next visible 8: 5/11/2020 7:54:24 +00:00
    Next visible 9: 5/11/2020 7:54:24 +00:00
    Next visible 10: 5/11/2020 7:59:24 +00:00
    Next visible 11: 5/11/2020 7:59:24 +00:00
    ...