python publish-subscribe zeromq messagebroker

ZMQ pub/sub reliable/scalable design

I'm designin a pub/sub architecture using ZMQ. I need maximum reliability and scalability and am kind of lost in the hell of possibilities provided.

At the moment, I got a set a publishers and subscribers, linked by a broker. The broker is a simple forwarder device exposing a frontend for publishers, and a backend for subscribers.

I need to handle the case when the broker crashes or disconnects, and improve the overall scalability.

Okay, so i thought of adding multiple brokers, the publishers would round robin the broker to send messages to, and the subscribers would just subscribe to all these brokers.

Then i needed a way to retrieve the list of possible brokers, so i wrote a name service that provides a list of brokers on demand. Publishers and subscribers ask this service which brokers to connect to.

I also wrote a kind of "lazy pirate" (i.e. try/retry one after the other) reliable name service in case the main name service falls.

I'm starting to think that i'm designing it wrong since the codebase is non stop increasing in size and complexity. I'm lost in the jungle of possibilities provided by ZMQ.

Maybe something router/dealer based would be usable here ?

Any advice greatly appreciated !

Solution

It's not possible to answer your question directly because it's predicated on so many assumptions, many of which are probably wrong.

You're getting lost because you're using the wrong approach. Consider 0MQ as a language, one that you don't know very well yet. If you start by trying to write "maximum reliability and scalability", you're going to end up with Godzilla's vomit.

So: use the approach I use in the Guide. Start with a minimal solution to the core message flow and get that working properly. Think very carefully about the right kind of sockets to use. Then make incremental improvements, each time testing fully to make sure you understand what is actually going on. Refactor the code regularly, as you find it growing. Continue until you have a stable minimal version 1. Do not aim for "maximum" anything at the start.

Finally, when you've understood the problem better, start again from scratch and again, build up a working model in several steps.

Repeat until you have totally dominated the problem and learned the best ways to solve it.