zeromq, C++, is it necessary to set a high water mark for subscribers?

I did a quick test of the ZeroMQ PUB/SUB and now have some working code. However, I am a bit confused about the concept of high water mark as applied in zeromq.

I have set a HWM in my publisher code which sets a queue length for each subscriber connected to the socket.

It is also possible however to set a HWM on the receiving socket of the subscriber. Is there any reason to set a HWM on the subscriber side and how would this differ from setting a publisher HWM?

Solution

Short answer:

In the publisher we should pretty much always carefully consider HWM, because there are plenty of reasons to crash (out of memory) affecting the overall system (since the publisher serves all the subscribers).

Also in the subscriber, there are cases in which regulating the HWM could be usefull, but this depends mostly on the nature of the subscriber, what it does with the received message and how high is the probability that it could not be able to process in time for a big number of received message; and by the expected runtime environment (how much memory is available, number of subscribers etc.)

More detailed answer:

ZMQ uses the concept of HWM (high-water mark) to define the capacity of it's internal pipes. Each connection out of a socket or into a socket has its own pipe and HWM for sending and/or receiving depending on the socket type. Some sockets ( PUB, PUSH, RADIO ) only have send buffers. Some ( SUB, PULL, DISH ) only have receive buffers. Some ( REQ, REP, DEALER, ROUTER, PAIR ) have both send and receive buffers.

The available socket options are:

ZMQ_SNDHWM: Set high water mark for outbound messages (... on the publisher socket )
ZMQ_RCVHWM: Set high water mark for inbound messages (... on the subscriber socket )

ZMQ 3.0+ forces default limits on its internal buffers (the so-called HWM), because the HWM is a great way to reduce memory overflow problems.

Both ZMQ_PUB and ZMQ_SUB have the ZMQ_HWM option action set to "Drop" therefore when the limits are reached the memory of the subscriber or the publisher should stop growing, by what amount depends on the ZMQ buffers.

Usually who need most protection against undiscriminated use of memory ( out of memory issues ) are the publishers:

Over the inproc transport the sender and receiver share the same buffers, so the real HWM is the sum of the HWM set by both sides.

But if you’re using TCP and a subscriber is slow, messages will queue up on the publisher.

Common failure causes of PUB-SUB include:

Subscribers can fetch messages too slowly, so queues build up and then overflow.
Networks can become too slow, so publisher-side queues overflow and publishers crash.

By Queueing messages on the publisher publishers run out of memory and crash, especially if there are lots of subscribers and it’ s not possible to flush to disk for performance reasons.

From the perspective of the publisher the great strategy that we can use by properly setting the HWM, is to Stop queuing new messages after a while so new messages just get rejected or dropped; it’ s what ØMQ does when the publisher sets a HWM.

ZMQ can also queue messages on the subscriber

If anyone’s going to run out of memory and crash, it’ll be the subscriber rather than the publisher, which is fair. This is perfect for “peaky” streams where a subscriber can’t keep up for a while, but can catch up when the stream slows down.

Note: the HWMs are not exact; while you may get up to 1,000 messages by default, the real buffer size may be much lower (as little as half), due to the way libzmq implements its queues.

The primary source of these assumption is Pieter Hintjens's book "Code Connected Volume 1" available online in electronic format; it has a chapter dedicated to High-Water Marks containg furher explanations about this topic.