Search code examples
zeromqnetmq

ZeroMq Sockets Design


I need to send large volumes of short messages to a single process from different agents.

Approximately 10-15 2-3 Kb messages/sec from 10 agents.

Some of the agents run on same machine, while others on separate machines in same VLAN.

The requirement is minimum latency between generation of message and start of processing it inside main service. Maximum of 50ms at peaks is acceptable ( though not desired ).

I am planning to use NetMq for the transport.

Q1: What is the best configuration of socket types and protocols?

Q2: Is it better to use a DEALER/ROUTER, a PUSH/PULL or maybe a PUSH/ROUTER setup?

Q3: Should I use IPC sockets for agents on same machine or is TCP fast enough for both types of agents?

Q4: Do I need to create TCP and IPC sockets inside the same context or create a separate context?


Solution

  • On subject: Messaging, Transport, Latency

    ZeroMQ is a scaleable formal communication patterns framework. It gives a set of smart objects and hides a lot of internal gritty-nitties from designers ( which is very beneficial ).

    ZeroMQ is designed for Messaging

    Transport-agnostic, which means, majority of use-cases does not take care, over what particular Transport-class ( { inproc: | ipc: | tcp: | pgm: | epgm: } ) the delivery takes place

    Low Latency optimised ( as a "Batteries included" principle )

    The key thing, you seem to have missed is, that the main difference comes from the fact, that your application shall build on ZeroMQ-archetypes' BEHAVIOUR, which is presented using use-case metaphoric names ( PUSH/PULL ), but understanding of which is very important for any good design.

    Latency will only sky-rocket, if the poor design decision makes the software use a "wrong" ( not well selected ) ZeroMQ-archetype, whose BEHAVIOURAL-MODEL will in turn introduce an unevitable need to handle / wait / block a work-flow model, just due to the mis-understood internal BEHAVIOUR of REQ/REP as an example and alike.

    This way, the lowest achievable latency results from smart design, not from (*)MQ-library.

    Having a craftmanship in near real-time system design, ZeroMQ will fit it's best of the genuine services in your code, however do not expect a poor system architecture to get magic speeds and latencies close to none just by linking in a few of the ZeroMQ-socket methods.

    On .Context( nIOthreads )

    A generally good practice is to isolate design features, so as to be able to validate their respective impact on final criteria. Should a minimum overall end-to-end latency is the case, there may be a step to increase the number of IO-threads, instantiated and used by the localhost application via .Context() method.

    10 x 15 x 3kB per second is not a big issue even for Raspberry Pi processor.

    The more important is, whether this sole step will bring any significant improvement over other design-compromises and/or other IO-workload related activities.

    Test it.

    However test it "in-vivo", not "in-vitro" under a set of artificially academic circumstances. Then you gain without losing, as your "surrounding" architecture is already implemented and you measure in the very context of your designed modus operandi.

    Where to go next?

    Good point to start from is to spend a week of hands-on testing with a Pieter HINTJEN's book "Code Connected, Volume 1" (asPdf). There you have discussed both the fabulous indepth knowledge-pool and code-examples, that will help you move forward.


    Let the merit-based facts get the focus:

    A4: No, book emphasises, where possible, that zmq.Context() is a singleton and in cases, where architecture/design/tests prove it beneficial, the zmq.Context() instance may have more than just one I/O-thread to handle the specific traffic-patterns on low-level ( without any other assistance / control from the user-code side ). In this very sense, it makes no sense until the already implemented and pre-tested solution went live, to speculate on such potential benefit. Once the solution is operational, it is possible to increase this parameter and experimentally measure it's possible influence on the overall processing performance.

    A good to-always-remind practice is to take indeed a due care about enforcing graceful terminations of socket(s) & context life-cycles, so as to avoid dirty leaks. Such practice is more than recommended.

    A3: Yes, may use TCP-transport-class for 'em too. On non-Windows localhosts an IPC-option becomes feasible. Speed questions shall rather be measured, not opinion-based. If your code struggle gets to shaving down additional [us]-s and [ns]-s ( zmq.Stopwatch goes down to about three tens of nanoseconds in it's resolution ), then you may, based on your code-design, expect some slight benefit from avoiding the TCP-protocol wire-line framing assembly/re-assembly/decode. The common question is at what additional costs this may become available for any measurable / observable speed-up.

    A2: Too broad to answer. Without a single word on processing architecture / taskUnit's workflow, there is no serious way to answer Q2.

    A1: Missing a criteria for the optimisation goal to be "The Best". Having no criteria / metrics, just a generic rule is valid -- The best one is any such one, that fully meets the spec & works fine and stable within a given time & budget.