Search code examples
asynchronousdesign-patternsarchitecturemessaging

How to keep order when consuming async messages (such as SQS or any other messaging service)


I've encountered this problem a few times and now I wonder what the industry best practice is, the context is, we have a data store which aggregates pieces of information taken from multiple micro-services, the way the data comes to us is through messages broadcasted by every source when there is a change

The problem is how to guarantee that our data will be eventually consistent and that the updates were applied in the order they were meant to be received. For example, Let's say we have an entity User

User {
   display_name : String,
   email: String,
   bio: String
}

And we are listening changes on those users to keep "display_name" updated in our data store, the messages come in a format such as

{
     event: "UserCreated",
     id: 1000,
     display_name: "MyNewUser"
}

{
     event: "UserChanged",
     id: 1000,
     display_name: "MyNewUser2"
}

There is a scenario where "UserChanged" reaches our listeners before "UserCreated" therefore our code won't be able to find user with id 1000 and fail both transactions. This is where a mechanism to sort those two is desired, we have considered:

  • Timestamps: The problem with timestamps is that although we know the last time we read an update we don't know how many events happened between the last event seen and the one we are currently processing
  • Sequence numbers: This is slightly better but if a sequence is lost then we won't update our storage unless we relax the rules a little bit, we could say that after some time if a sequence hasn't been seen then proceed with the rest of operations

If anyone knows common design patterns that tackle this sort of issue would be great to know, also open to suggestions on perhaps data modeling, etc. Bottomline, I'm pretty sure this is a common software problem that has been solved many times before

Thanks a lot for the help!


Solution

  • My first thought here would be to jump directly to a sequence numbers-based approach, but this works when you got 1 to 1 communication, like in TCP orientated communications. In your case, there is many to one, so without a coordination between the senders, it would be challenging to implement this approach correctly (ex. 2 senders can use the same sequence number).

    Yes, losing the messages would be problematic, but I don't think that's the case of SQS or other cloud-based message queues (of course, it depends on the scale you're working on), because they're known for data duplication instead of data loss (AFAIK).

    One idea I can think of right now is to add a new layer between the senders and the consumer, which will orchestrate the events. It can be the consumer itself, but it can be another service in front of it, let's call it orchestrator.

    The orchestrator is connected with each senders (individually) via 2 queues:

    • The first queue is used to get the actual events from the sender
    • The second queue is used to signal back to the sender an ACK-like event (the message has been received, validated and successfully passed downstream to the consumer (or consumed directly)).

    The way it works is the following:

    • The orchestrator gets the event from the sender A
    • It tries to execute a validation-like operation specific to the message (update on an inexisting user), the operation fails, so it sends a N-ACK message back to sender A, signaling that its message was not able to be processed successfully. Sender A will try to resend the message after some time.
    • In the mean time, it gets the "create user" message from sender B, the message get passed downstream to the consumer
    • Finally, it will get the message from sender A (after some retries).

    This solution ensures message ordering in a pretty basic way, without keeping the events in memory, but rather in the queues. It may work, but it depends on a lot of factors, like number of events, number of senders, etc.