I have a tricky situation where I need a process to be run on a queue of messages in order that they arrive in the queue but the queue is going to be of several hundred queues that need to be dynamically created
Each message relates to a customer so I would have 2 queue per customer
I need a way of having a queue trigger for each queue but there are potentially going to be several thousand queues
Queue triggers appear to require the name to be hardcoded for the queue
Obviously I can’t have a function in my code for each queue name
Has anyone got any idea of how I can do this?
Let’s say we have 3 customers with 5 messages each
This would be 3 queues with 5 messages
I want to be able to process all 3 customers at once not have everything in a single queue because the customers are not related so the messages being queued have nothing to do with each other at the customer level so I don’t want customer 3 messages having to wait whilst customer 1 And 2 are being processed
Does anyone know how to do this?
Paul
The conceptual solution to this problem is to establish a fixed set of queues and to manage which queues the messages go into to ensure optimal performance. We call this Partitioning and instead of rolling your own, most of the Azure Service Bus services have support for partitioning built in. Partitioned queues and topics
It is also not necessary to hardcode connections or queue names, you can make these configuration settings. Then instead of directly coding all queues and connections, you could split the deployment over multiple configured instances. While still unmanageable IMO, it moves this from a coding problem to a deployment problem.
Let me convince you not to implement dedicated queues per message producer
When a customer presents with this requirement:
I want to be able to process all 3 customers at once not have everything in a single queue because the customers are not related so the messages being queued have nothing to do with each other at the customer level so I don’t want customer 3 messages having to wait whilst customer 1 And 2 are being processed
My first response is "Why?" What is the likelihood that multiple customers will make requests at the same time, what will be the frequency of the requests? How long does it take to process a single message and how long is it acceptable for the customer to wait for a response?
When we are dealing with scale of say 10000 message generators, it is expensive to think linearly. Even though 1 queue per customer allows potential for parallel processing, the queues and the management of them is growing linearly. It helps to think differently, about batching and partitioning and sharding to optimise utilisation of a smaller set of resources in a manageable way.
The reality of how code is executed is that there is always some impact from each message on the others, even if you spawn dedicated threads, there are limits to the number of threads your process can consume. Especially with hyper-threading and virtualization there is always some level of impact from each message on the processing of others, unless you deploy to isolated servers. Attempting to service this number of customers without each impacting the other using individual queues is going to require processing across multiple instances which means you will always need some level of orchestration to the runtime.
The problem with a high number or infinite queues is that you need to manage them. To ensure sequential consistency you would need to spawn a separate thread to monitor each queue and receive messages. If each queue is likely to recieve messages in high frequency, this might be justified, but as soon as you introduce locks to ensure that a receiving thread was the only one processing messages for a given sender, then there is no reason that any idle thread couldn't process any message.
Consider Industrial IoT implementations that process streams of messages from tens of thousands of devices, with throughput measured in thousands of messages per minute... According to your model, I needed thousands of threads or methods configured to recieve messages. I would probably want to spread the load over multiple servers, but how would I know which server had registered listeners for each sender. Then we consider availabilty of this solution, what if one of the servers goes down, how will I detect that, how will I know which senders to subscribe to again.
The power of the Azure Platform comes from designing solutions so that they can be deployed in a way where the platform can manage your allocation of resources for you and move the workloads from busy resources to idle resources. Your requested solution becomes a lot harder to scale out, and your scale metric is now the number of message senders (customers) and is not related to the throughput at all, which becomes wasteful and expensive.
Partioning is a trade off, your idea describes the optimal theoretical throughput but requires infinite access to resources. Partitioning allows us to optimise throughput, by utilising a much lower and potentially constant amount of resources. All while still maintaining sequential consistency.
Each customer would be mapped to a partition, this is determined by the partition key. We try not to constrain specific message generators to specific partitions, the platform will do this for us and will ensure that messages with the same partition key will always be sent to the same internal partitioned queue.
Note: This is an over-simplification. In many configurations, if a partition goes offline, messages can be re-routed to another partition, the overall consistency will be restarted and around this event some newer messages might be processed before older ones on the previous partition.
We can still process each partition in isolation, but this means that if 2 customers are assigned to the same partition, that the processing of the first message will delay processing of the second one. This is the trade-off. Processing 1 message does not hold up all other customers until that message is processed, only those customers in the same partition.
Partitioned entities limitations
Currently Service Bus imposes the following limitations on partitioned queues and topics:
- Service Bus currently allows up to 100 partitioned queues or topics per namespace for the Basic and Standard SKU.
- Each partitioned queue or topic counts towards the quota of 10,000 entities per namespace.
Partitioning helps us manage the problem of splitting the processing over multiple servers or processing instances. Using the SDK, each instance, when idle will poll the partitions and attempt to acquire a lock, establishing a singleton connection to a specific partition. This prevents other processes from accessing this partition. When that instance goes idle or the lease on the partition ends the connection is dropped. The next idle instances will pick it up.
There is a bit going on under the hood, but this means that we can effectively scale out to a number of instances and the load will be routed for us without having to write too much code.
For IoT level throughput consider using EventHub and EventProcessorHost
If you really need more than 100 partitions, we simply create another level of partitioning by splitting customers between multiple partitioned queues. 10 named queues could service 1000 partitions. If you really needed to service 10000 customers and ensure that no customer is waiting for another customer's message processing then you would need 100 named queues. This at least is more manageable than 10000 queues.
In most implementations, although theoretically undesirable, the practical implication of 1 customer waiting 200 milliseconds for another message to process first is going to be negligible. 3 seconds might even be acceptable, as long as we maintain consistency of the sequence. Depending on how long it takes to process each message and the frequency of the messages being generated you might find it acceptable for dozens of message generators to be assigned to the same partition (remember each message generator-customer has their own unique partition key, the platform will assign to the available partitions using logic based on the key)
When dealing with paying customers, you might consider that some customers could pay a premium for dedicated resources or for access to environments with a lower number of concurrent customers. For this you might even go so far as to deploying a separate service bus and limit the number of customers who send messages to it, or you might just create a separate queue, again limiting the number of customers who send messages to that queue.
Remember that the connection string, queue name and partition can be allocated through configuration, so your code could be a single method implementation and you could deploy it to multiple app services, each with a different configuration, or you might use deployment slots, against with each slot assigned different configurations.
If you really think that you require the scale, you might use a combination of these techniques or you might automate the deployment so that new queues are created and the same code deployed to dedicated instances, with the only difference being the configuration.
Really stop to analyse the actual requirement and acceptable tolerances in terms of delays to message delivery and how you want to handle failure. Partitioning is a pragmatic solution that makes it possible to achieve near real-time processing for reasonable runtime cost and development effort.