I have a use case where I need to process a set of events. I need them to process in parallel holistically but serially for each user. Can this be done in PubSub (maybe GCP Tasks?)?
For Example:
6 events come in at one time (User_A_Event_1, User_A_Event_2, User_B_Event_1, User_B_Event_2, User_C_Event_1, User_D_Event_1).
I want to group them by UserID, process each user in parallel and then and process each event serially (subsequent event processing won't begin until successful completion of the prior event). Something like:
If it matters, I have not idea what users will have events and at what times. We might go months without seeing any events for a user and then start getting lots of them.
I am trying to figure out a way to accomplish this in GCP PubSub but I am open to other solutions as well. My preference would be to do this through a push instead of a pull as I could go long periods of time with nothing in the queue.
Appreciate your help.
Craig
Cloud Pub/Sub's ordered delivery could help here. You would use the user as the ordering key. This would mean that Cloud Pub/Sub would deliver the messages to your subscribers in the order in which they were received by the service from your publishers. Ordered delivery would have the properties you desire where you don't know the set of users in advance and where events for a particular user can be rare or bursty.
On the subscribe side, the guarantees made depend on the type of subscriber. For the client libraries (which use streaming pull), the callback you provide will be executed to completion for messages with the same key one at a time. For subscribers using pull, each pull request will contain messages for a key in the order in which they were received and a key's messages will only be outstanding in one pull response at a time. For push subscribers, each message for an ordering key will be sent individually to your endpoint and the next message will not be sent until the previous message for the same key is acknowledged.
Note that Cloud Pub/Sub's ordered delivery still has at-least-once delivery semantics, meaning that an acknowledged message could be redelivered, which would also result in the redelivery of subsequent messages for the same key.
See the Medium post about ordering for more details.