I've encountered this problem a few times and now I wonder what the industry best practice is, the context is, we have a data store which aggregates pieces of information taken from multiple micro-services, the way the data comes to us is through messages broadcasted by every source when there is a change
The problem is how to guarantee that our data will be eventually consistent and that the updates were applied in the order they were meant to be received. For example, Let's say we have an entity User
User {
display_name : String,
email: String,
bio: String
}
And we are listening changes on those users to keep "display_name" updated in our data store, the messages come in a format such as
{
event: "UserCreated",
id: 1000,
display_name: "MyNewUser"
}
{
event: "UserChanged",
id: 1000,
display_name: "MyNewUser2"
}
There is a scenario where "UserChanged" reaches our listeners before "UserCreated" therefore our code won't be able to find user with id 1000 and fail both transactions. This is where a mechanism to sort those two is desired, we have considered:
If anyone knows common design patterns that tackle this sort of issue would be great to know, also open to suggestions on perhaps data modeling, etc. Bottomline, I'm pretty sure this is a common software problem that has been solved many times before
Thanks a lot for the help!
My first thought here would be to jump directly to a sequence numbers-based approach, but this works when you got 1 to 1 communication, like in TCP orientated communications. In your case, there is many to one, so without a coordination between the senders, it would be challenging to implement this approach correctly (ex. 2 senders can use the same sequence number).
Yes, losing the messages would be problematic, but I don't think that's the case of SQS or other cloud-based message queues (of course, it depends on the scale you're working on), because they're known for data duplication instead of data loss (AFAIK).
One idea I can think of right now is to add a new layer between the senders and the consumer, which will orchestrate the events. It can be the consumer itself, but it can be another service in front of it, let's call it orchestrator
.
The orchestrator is connected with each senders (individually) via 2 queues:
The way it works is the following:
This solution ensures message ordering in a pretty basic way, without keeping the events in memory, but rather in the queues. It may work, but it depends on a lot of factors, like number of events, number of senders, etc.