Search code examples
google-cloud-pubsubbroadcast

GCP PubSub - broadcast message - only relevant subscriber handles message


I'm using GCP PubSub as a backend for Web Sockets in a load-balanced environment. The current implementation has a topic for each server behind the load balancer and a mapping between end-users and servers. When I wish to send a message to a particular user, I use the mapping to determine which topic to publish it on.

This works, but it has a lot of moving parts, and requires cleanup of topics when servers are removed by downscaling or rolling out an updated version of the application.

I'm now exploring a more sophisticated implementation, whereby there is only one topic. Since each server knows its end users, in theory I could publish a message to this single topic and each server could inspect the message, comparing it to its list of users. Only the server which currently has a Web Socket connection for the end user specified in the message would handle it.

Which brings me to my question(s) - how do I achieve this with PubSub?

  • Currently I'm using Pull subscriptions, do I need to use a Push subscription? Perhaps that would allow simultaneous delivery to all subscribers?
  • In the Pull model, if a subscriber doesn't .ack() a message, presumably that allows the message to be redelivered, but it could take a long time for the message to eventually get sent to its appropriate subscriber (defeating the purpose of Web Sockets, which is "real time" updates) - is this a fair assessment of how it would work in a Pull subscription model?
  • Am I using the wrong tool for the job? It's possible, but I'm hoping I just need to make different use of the current tool

Solution

  • In this case I will use this design:

    • Create only one topic where all the messages are published
    • When a VM starts, the VM creates itself a pull subscription to the PubSub topic
    • When the VM shuts down, the VM deletes the subscription (in shutdown script for example)

    Then, when a message arrives, it is posted in only one PubSub topic and fanned out to all the active subscription. The VM pull continuously the messages. When a message arrives in the pull queue:

    • The VM check if it is for it
      • if no, ack the message (remove it from its subscription, not for the others)
      • if yes, process, and ack the message

    With this design, you minimize the latency, and you publish in only one topic. However, you duplicate a lot of message and you consume processing power to discard all the irrelevant messages.


    EDIT 1

    The principle is the following: you publish 1 message in the topic, then the message is duplicated in all subscription, and the subscribers (1 or many per subscription) receive a subset of message of 1 subscription (or all the messages if there is only 1 subscriber on the subscription)

    That's why, in my proposition:

    • Each VM, create its own subscription and is the only one subscriber on it to receive a copy of all the messages published in the topic.
    • The irrelevant messages are acknowledged to remove them from the queue. They are only deleted from the current subscription, of the current VM.