Search code examples
cerror-handlingzeromq

Exception handling in a pub-sub scheme (ZeroMQ)


I've created a publisher-subscriber communication scheme with ZeroMQ and I noticed one small issue with my server and client programs. I know there is no try catch in C (according to my brief research) however having the next two while(1) without an exception catching seems dangerous to me.

Taking into account the following code snippets, what would be the most correct way to handle an exception (inside the while)? With the structure I have right now (as you can see below), the zmq_close and zmq_ctx_destroy will never execute, but I want them to, in case of a program error/exception (whichever the origin).

Note: In this architecture I have one client listening to multiple publishers, thus the for cycles in the Client code.

Server

(...inside main)

while (1) {
    char update[20];
    sprintf(update, "%s", "new_update");
    s_send(publisher, update);
    sleep(1);
}

zmq_close(publisher);
zmq_ctx_destroy(context);
return 0;

Client

(...inside main)

while(1){
    for (c = 1; c < server_num; c = c + 1){
        char *msg = s_recv(subscribers[c]);

        if (msg) {
            printf("%s\n",msg);
            free(msg);
        }
        sleep(1);
    }
}

for (c = 0; c < server_num; c = c + 1)
    zmq_close(subscribers[c]);

zmq_ctx_destroy(context);
return 0;

Solution

  • Being the tag present:

    Q : what would be the most correct way to handle an exception (inside the while)?

    The best strategy is an error-prevention rather than any kind of "reactive" ( ex-post Exception ) handling.

    Always assume the things may and will turn wreck havoc and let them fail cheaply. The cheaper the costs of failings are, the better and sooner the system may turn back into its own, intended behaviour.

    This said, in modern low-latency distributed-systems, the more in real-time-systems an exception is extremely expensive, disruptive element of the designed code-execution flow.


    For these reasons, and for allowing sustained levels of the utmost performance too, ZeroMQ has since ever a very different approach :

    0)
    better use zmq_poll() as a cheapest ever detection of a presence ( or not presence ) of any read-able message ( already delivered and being ready so as to be received ), before ever, if at all, calling an API function of a zmq_recv(), to fetch such data into your application-level code's hands, from inside the Context()-instance internal storage.

    1)
    depending on your language binding (wrapper), best enjoy the non-blocking forms of the .poll(), .send() and .recv() methods. The native API is the most straightforward in always going in this mode with retCode = zmq_recv( ..., ZMQ_NOBLOCK );

    2)
    Always analyse the retCode - be it in a silent or explanatory assert( retCode == 0 && zmq_errno() ) or otherwise.

    3)
    Best review and fine-tune all configuration attributes of the instantiated tools available from ZeroMQ framework and harness all their hidden strengths to best match your application domain's needs. Many native API-settings may help mitigate, if not principally avoid, lots of colliding requirements right inside the Context()-engine instance, so do not hesitate to learn all details of possible settings and use them to the best of their help for your code.

    Without doing all of this above, your code is not making the best of the Zen-of-Zero


    Q : With the structure I have right now (...), the zmq_close and zmq_ctx_destroy will never execute, but I want them to, in case of a program error/exception (whichever the origin).

    it is fair enough to set an explicit flag:

    bool    DoNotExitSoFar = True;
    
    while ( DoNotExitSoFar ){
        // Do whatever you need
    
        // Record return-codes, always
           retCode = zmq_...(...);
    
        // Test/Set the explicit flag upon a context of retCode and zmq_errno()
           if ( retCode == EPROTONOTSUPPORTED ){
             // take all due measures needed
                ...
             // FINALLY: Set
                DoNotExitSoFar = False;
           }
    }
    
    // --------------------------------------------- GRACEFUL TERMINATION .close()
    if ( ENOTSOCK == zmq_close(...) ) { ...; }
    ...
    // --------------------------------------------- GRACEFUL TERMINATION .term()
    retCode = zmq_ctx_term(...);
    
    if ( EINTR  == retCode ){ ...; }
    if ( EFAULT == retCode ){ ...; }
    ...
    

    Using other tooling, like int atexit(void (*func)(void)); may serve as the last resort for a ALAP calling zmq_close() or zmq_ctx_term()