Are there any recommended architectural patterns with Service Bus for ensuring ordered processing of nested groups of messages which are sent out of order? We are using Sessions, but when it comes down to ensuring that a set of Sessions must be processed sequentially in a certain order before moving onto another set of Sessions, the architecture becomes cumbersome very quickly. This question might best be illustrated with an example.
We are using Service Bus to integrate changes in real-time from a database to a third-party API. Every N minutes, we get notified of a new 'batch' of changes from the database which consists of individual records of data across different entities. We then transform/map each record and send it along to an API. For example, a 'batch' of changes might include 5 new/changed 'Person' records, 3 new/changed 'Membership' records, etc.
At the outer-most level, we must always process one entire batch before we can move on to another batch of data, but we also have a requirement to process each type of entity in a certain order. For example, all 'Person' changes must be processed for a given batch before we can move on to any other objects.
There is no guarantee that these records will be queued up in any order which is relevant to how they will need to be processed, particularly within a 'batch' of changes (e.g. the data from different entity types will be interleaved).
We actually do not necessarily need to send the individual records of entity data in any order to the API (e.g. it does not matter in which order I send those 5 Person records for that batch, as long as they are all sent before the 3 Membership records for that batch). However, we do group the messages into Sessions by entity type so that we can guarantee homogeneous records in a given session and target all records for that entity type (this also helps us support a separate requirement we have when calling the API to send a batch of records when possible instead of an individual call per record to avoid API rate limiting issues). Currently, our actual Topic Subscription containing the record data is broken up into Sessions which are unique to the entity type and the batch.
"SessionId": "Batch1234\Person"
We are finding that it is cumbersome to manage the requirement that all changes for a given batch must be processed before we move on to the next batch, because there is no Session which reliably groups those "groups of entities" together (let alone processing those groups of entities themselves in a certain order). There is, of course, no concept of a 'session of sessions', and we are currently handling this by having a separate 'Sync' queue to represent an entire batch of changes which needs to be processed what sessions of data are contained in that batch:
"SessionId": "Batch1234",
"Body":
{
"targets": ["Batch1234\Person", "Batch1234\Membership", ...]
}
This is quite cumbersome, because something (e.g. a Durable Azure Function) now has to orchestrate the entire process by watching the Sync queue and then spinning off separate processors that it oversees to ensure correct ordering at each level (which makes concurrency management and scalability much more complicated to deal with). If this is indeed a good pattern, then I do not mind implementing the extra orchestration architecture to ensure a robust, scalable implementation. However, I cannot help from feeling that I am missing something or not thinking about the architecture the right way.
Is anyone aware of any other recommended pattern(s) in Service Bus for handling ordered processing of groups of data which themselves contain groups of data which must be processed in a certain order?
For the record I'm not a service bus expert, specifically.
The entire batch construct sounds painful - can you do away with it? Often if you have a painful input, you'll have a painful solution - the old "crap in, crap out" maxim. Sometimes it's just hard to find an elegant solution.
Do the 'sets of sessions' need to be processed in a specific order? Is a 'batch' of changes = a session?
I can't think of a specific pattern, but a "divide and conquer" approach seems reasonable (which is roughly what you have already?):
BatchProcessor
.BatchProcessor
applies all the rules to the batch, as you outlined.BatchProcessor
dump it's results on a queue of some kind which is the source for the API - that way you have some kind of isolation between the batch processing and the API.