Search code examples
pythonzeromqdistributed-computingpyzmqlow-latency

Is time.sleep() impacting the ZeroMQ poller.poll() response?


I'm getting results I can't explain when making a ZeroMQ REQ/REP request with polling.

The elapsed time, taken by the poll()-method, is influenced by a time.sleep() elsewhere in the code.

I have a server process which runs a command, and returns the pid of the process. This is the request side code which has the issue:

import datetime, sys, time, zmq

sleeptime = float(sys.argv[1])

def log_msg(msg):
    print "{0}, {1}".format(datetime.datetime.utcnow().strftime('%Y%m%d_%H:%M:%S.%f'), msg)

context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://{0}:{1}".format('myserver', 9980))
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
req = { 'req_type':'raw_cmd', 'cmd':'echo hello', 'block':0, 'timeout':300, 'return_output': 0}

for i in range(4):
    time.sleep(sleeptime)
    socket.send_json(req)
    start = time.time()
    socks = dict((poller.poll(30000)))
    elapsed = (time.time()-start)*1000
    rep = socket.recv_json()
    log_msg('pid={0}, sleep={1}, time taken={2}'.format(rep['pid'], sleeptime, int(elapsed)))

The first poll elapsed time is quite variable, but all the subsequent ones take about 2 seconds less the sleep time, unless sleep time is 0, in which case it's fast, so:

sleep = 0.5 :

20201008_08:27:24.168800, pid=52528, sleep=0.5, time taken=505
20201008_08:27:26.210196, pid=52529, sleep=0.5, time taken=1540
20201008_08:27:28.250891, pid=52530, sleep=0.5, time taken=1539
20201008_08:27:30.295036, pid=52531, sleep=0.5, time taken=1543

sleep = 1.5 :

20201008_08:44:02.474492, pid=54730, sleep=1.5, time taken=295
20201008_08:44:04.516844, pid=54731, sleep=1.5, time taken=540
20201008_08:44:06.557980, pid=54732, sleep=1.5, time taken=539
20201008_08:44:08.599717, pid=54733, sleep=1.5, time taken=539

sleep = 0 :

20201008_08:27:13.999147, pid=52513, sleep=0.0, time taken=690
20201008_08:27:14.033915, pid=52514, sleep=0.0, time taken=34
20201008_08:27:14.068803, pid=52515, sleep=0.0, time taken=34
20201008_08:27:14.103947, pid=52516, sleep=0.0, time taken=34

So question is, why does the time.sleep() affect the time taken by the ZeroMQ poller.poll()?


Solution

  • The code as-is mixes apples with oranges:

    time.time() returns seconds since epoch as float, i.e. ( time.time() - start )*1000 yields how many [ms] milliseconds the code-section under test actually took.

    Whereas the poller.poll()-method expects milliseconds for it's timeout, so the requested 30000 [ms] makes the poller-instance wait and hold until 30 [s] elapse or a message-arrival related zmq.POLLIN-event permits to step forwards anywhere earlier.

    This said, your measurement explores not the .sleep()-method, but intervals taken at random by an external process, that sends REP-side answers to REQ-side .send()-s.

    Still in doubts? Set sleeptime to 31 [s] and you ought receive almost zero durations reported from an ill-formulated local time-bracket (sure, if and only if the remote REP-lying entity did not crash of suffocate during such ops).

    You might also like zmq.Stopwatch class, having { .start(), .stop() }-methods, that provide a smart tool for using a ZeroMQ native [us]-resolution-clock. Most of my posts on ZeroMQ performance use this for microbenchmarking.