Search code examples
google-cloud-pubsub

StreamingPullFuture fails under heavy load


In our project, we use GCP pubsub to route asynchronous jobs from applications to the worker. To subscribe we use the subscribe method from the pubsub_v1.SubscriberClient. It returns subscriber.futures.StreamingPullFuture object, example:

future = subscriber_client.subscribe(
    subscription, callback)

To monitor if the subscription is alive we use running from the StreamingPullFuture API example:

if future.running():
  do something
else:
  do something else

Here is the problem. When a worker gets high traffic - in the pubsub, there are more messages than the worker is able to process within some period of time this check future.running() evaluates to false. On the other hand, when the worker is able to keep up with the traffic/ the traffic is fairly low the check always evaluates to true.

Any idea why is it happening? And how to handle future.running() == false ? Should we re-subscribe?


Solution

  • It sounds like the subscriber is getting overwhelmed with the work and is not able to keep up with basic operations needed to keep the stream open to the service, e.g., heartbeats on the stream. If processing your messages is intensive enough that a subscriber becomes resource constrained under high load, then you probably want to set tighter flow control limits in order to reduce the amount of outstanding work to the subscriber client when publish load increases.

    With the flow control limits in place, you can either horizontally scale the number of subscribers in order to process the messages in the same amount of time (spreading the load out over more instances) or you can allow messages to be processed by the single subscriber, thus spreading the load out over a longer period of time.