Search code examples
node.jsazureazure-functionsazure-storage-queues

add thousands of messages to an Azure Storage Queue


I am trying to add about 6000 messages to my Azure Storage Queue in an Azure Function with Node.js.

I have tried multiple ways to do this, right now I wrap the QueueService method in a Promise and resolve the 6000 promises through a Promise.map with a concurrency of about 50 using Bluebird.

const addMessages = Promise.map(messages, (msg) => {
  //returns a promise wrapping the Azure QueueService method
  return myQueueService.addMessage(msg);
}, { concurrency: 50 });

//this returns a promise that resolves when all promises have resolved.
//it rejects when one of the promises have rejected.
addMessages.then((results) => {
  console.log("SUCCESS");
}, (error) => {
  console.log("ERROR");
});

My QueueService is created with an ExponentialRetry policy.


I have had mixed results using this strategy:

  • All messages get added to my queue and the promise resolves correctly.
  • All messages get added to my queue and the promise does not resolve (or reject).
  • Not all messages get added to my queue and the promise does not resolve (or reject).

Am I missing something or is it possible for my calls to sometimes take 2 minutes to resolve and sometimes more than 10 minutes?

In the future, I probably am going to have to add about 100.000 messages, so I'm kind of worried about the unpredictable result I have now.

What would be the best strategy to add a large number of messages in Node (in an Azure Function)?


EDIT:

Not sure how I missed this, but a pretty reliable way to add my messages to my Storage Queue is to use the queue output binding of my Azure Function:

https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#storage-queue-output-binding

Makes my code a lot easier as well!

for (var i = 0; i < messages.length; i++) {
  // add each message to queue
  context.bindings.outputQueue.push(messages[i]);
}

EDIT2:

I am going to split up my messages in batches of about 1000 and store these batches in Azure Blob Storage.

Another Azure Function can be triggered each time a new blob is added and this function will handle the queueing of my messages by 1000 at a time.

This should make my queueing much more reliable and scalable, as I tried adding 20.000 messages to my queue through my output binding and recieving an Azure Function timeout after 5 minutes being able to process only about 15.000 messages.


Solution

  • What triggers this function? What I would recommend, instead of having a single function add all of those messages, is to fan out and allow those functions to scale and take better advantage of concurrency by limiting the amount of work they're doing.

    With I'm proposing above, you'd have the function that handles the trigger you have in place today queue up the work that would in turn be processed by another function that performs the actual work of adding a (much) smaller number of messages to the queue. You may need to play with the numbers to see what works well based on your workload, but this pattern would allow those functions to better scale (including across multiple machines), better handle failures and improve reliability and predictability.

    As an example, you could have the number of messages in the message you queue to trigger the work, and if you wanted 1000 messages as the final output, you could queue 10 messages instructing your "worker" functions to add 100 messages each. I would also recommend playing with much smaller numbers per function.

    I hope this helps!