I have a system where a potentially large number of clients make persistent streaming connections to an auto-scaling cluster of application servers in the cloud. Each server knows the id of each client connected to it.
At some point another system component will want to send a message to a specific one of the connected clients. A message might be sent before the client connects (in which case it should be held until the client connects). I'm trying to figure out the best routing/queue design to get that message to the correct server, but it doesn't seem to match up with the assumptions underlying existing messaging/pubsub/notification/etc frameworks.
I could have a global lookup or registry that stores the mapping between server instance and the list of client ids connected to it. Servers would be responsible for maintaining it as clients connect and disconnect, and message senders would be responsible for doing the lookup and initiating connections to the application servers. However, this has a bunch of undesirable characteristics I want to avoid:
There are a bunch of options here: AWS SQS, Google PubSub, RabbitMQ, etc. However this use case doesn't seem to cleanly line up with how they're designed. One major design choice is whether there's a single, global queue (or pubsub topic, etc) and the target client id is an attribute or metadata on the message, or a queue per server, or a queue per client.
This probably has the cleanest semantics. . a server subscribes to a client's queue when the client connects. It also has the benefit that the log of messages in the queue is exactly the log of messages sent to that client, so it's easier to trace and debug client sessions. However, this requires that queues be dynamically, and ideally quickly, created either by the sender or by the subscriber (as it's not tenable to pre-create a queue for every client id), depending on whether a message or connection happens first, and that isn't a feature I can find.
I could have a queue created whenever a new server spins up, and the publisher sends each message to all queues (fanout). However, a server would have to ack a message if the client isn't connected to that server, which means if the client isn't connected at all, all servers will ack and the message will never get sent when the client connects. Also garbage collecting queues as servers spin up and down sounds unpleasant.
In this model there's one global queue for client messages and all of the application servers subscribe, with the one "correct" server (the one the client is actually connected to) handling and acking the message. But this isn't really how message queues are designed. They tend to assume that multiple subscribers are equivalent and any one of them could equivalently handle the message, there's no particular guarantees about which one gets a message, and a subscriber not handling a message is considered an exception with timeout latency and other problems introduced.
Is there a framework that gets me what I want here? It doesn't seem that complicated.
SO suggested this as related: GCP PubSub - broadcast message - only relevant subscriber handles message
and indeed that's basically the same requirement. But the accepted answer is what I described as Queue per Server, with the problem that if the client isn't currently connected the message is acked by all servers and dropped
I came up with a design that involves a sequence of queues and should handle my requirements.
First, on Queue per Client which I still think is the best solution if there was a framework that supported lightweight queues created on demand by either subscribers or publishers. . I still can't find this, and resource limits (e.g. GCP PubSub limits projects to 10,000 topics) suggest this is not how message queues are intended to work.
So my solution is
When an application server gets a message from its Server Queue, it checks to see if the recipient client is connected. If not, it acks the message off the server queue and ignores it. If the client is connected to that server, it sends the message to the client, acks it off of the server queue, and additionally acks the message from the global queue on a side-channel (this could be a Lambda, an rpc, or an additional global ack queue, which I'm leaning toward).
This is more complex, but should handle the requirements that all servers get a chance to see the message so that the correct server can find it, and that if a client isn't currently connected, the message will be saved somewhere (in this case in the global queue) for redelivery.
I'll leave the question open for a bit in case anyone has any better design ideas.