Search code examples
pythonzeromqpyzmq

pyzmq context sometimes slow to initialize


I have recently started to use ZeroMQ and have observed that sometimes initialization of the ZeroMQ context can take as much as 40 seconds. Usually it's much faster than that.

Eg:

import zmq

# the next line will sometimes take 40 seconds
# or so but is usually near instantaneous
with zmq.Context.instance() as context:
    # connect / bind to socket, do stuff
    pass

This behaviour seems to be independent of whether or not I use a context manager, eg the following behaves the same as the above:

import zmq

# the next line will sometimes take 40 seconds
# or so but is usually near instantaneous
context = zmq.Context.instance()

# connect / bind to socket, do stuff

context.term()

This is running in Python 3.6.8 on Ubuntu 18.04.2

I haven't been able to establish any pattern to the slowdowns.

What are the possible causes of this behaviour?

Update

Per @user3666197 suggestion, I took some timings and managed to get the slowdown to manifest on the second run. Things returned to "normal" after that.

The slowdown kicked in on the second run of the test script:

 = zmq.Context()] took +      704 [us] to execute
 1.~                          704  us
 2.~                  131 766 816  us
 3.~                          145  us
 4.~                          137  us

Watching memory on htop at the time did not show memory filling up at all but swap was already active.

Update 2

Using a pruned-down version of @user3666197's suggested script:

import sys

import zmq
clock = zmq.Stopwatch()

clock.start()
foo = zmq.Context()
bar = clock.stop()
print("{0:>2s}: {1:>9d}".format(sys.argv[1], bar))
foo.term()

And calling it in a loop, eg:

for x in $(seq 1 20);
do python3 test.zmq.py $x;
done

I can repeat the issue on the original server but have yet to hit it on another server, which suggests an environmental issue (or something to do with the specific distro version and/or python version as the "good" server is older).

This is sufficient for me for now but I will come back to it later and gather enough info to submit something to the maintainers.


Solution

  • What are the possible causes of this ( only "sometimes" present ) behaviour? "
    I haven't been able to establish any pattern to the slowdowns.

    a ) those 40 seconds,
    mentioned above, could make sense but with a RAM to swap-I/O operations. I cannot imagine any other reason for such block of time ( I've never noticed anything close to the 3/4 of a minute delay in ZeroMQ during my last ~10 years with ZeroMQ, never )

    b ) the reality check
    always measure the respective actual times of execution, before any claim get raised :
    let's start to be systematic and fair on repeatable and documented test-cases, ok?
    Categories like "sometimes" are not worth, data are. As a start, we may use the following templated approach, to discriminate the root cause of any actual delay :

    M = "[{0:_>40s}] took +{1:_>9d} [us] to execute"
    N = 0
    from zmq import Stopwatch; aClk = Stopwatch()
    #------------------------------
    pass;                      aClk.start()
    import zmq
    pass;                  _ = aClk.stop(); print( M.format( "IMPORT", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX = zmq.Context()
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    context = zmq.Context.instance()
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context.instance()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    context.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    with zmq.Context.instance() as aCtxInCtxMGR:
         N +=1
    pass;                  _ = aClk.stop(); print( M.format( "with … as aCtxInCtxMGR:", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX = zmq.Context(  0 )
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context(  0 )", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX = zmq.Context( 10 )
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 10 )", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX = zmq.Context( 20 )
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 20 )", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX = zmq.Context( 50 )
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 50 )", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX = zmq.Context( 99 )
    pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 99 )", _ ) )
    #------------------------------
    pass;                      aClk.start()
    aCTX.term()
    pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
    #------------------------------
    

    et cetera
    ad plausible


    Your actual mileage may vary ( and will )

    [__________________________________IMPORT] took +_____1198 [us] to execute
    [_________________________= zmq.Context()] took +_____1523 [us] to execute
    [________________= zmq.Context.instance()] took +_____1348 [us] to execute
    [_________________with … as aCtxInCtxMGR:] took +_____7909 [us] to execute
    [_____________________= zmq.Context(  0 )] took +______966 [us] to execute
    [_____________________= zmq.Context( 10 )] took +______944 [us] to execute
    [_____________________= zmq.Context( 20 )] took +______936 [us] to execute
    [_____________________= zmq.Context( 50 )] took +______962 [us] to execute
    [_____________________= zmq.Context( 99 )] took +_____1015 [us] to execute
    

    [__________________________________IMPORT] took +______879 [us] to execute
    [_________________________= zmq.Context()] took +_____1076 [us] to execute
    [_________________________________.term()] took +______881 [us] to execute
    [________________= zmq.Context.instance()] took +_____1105 [us] to execute
    [_________________________________.term()] took +______896 [us] to execute
    [_________________with … as aCtxInCtxMGR:] took +_____1474 [us] to execute
    [_____________________= zmq.Context(  0 )] took +_____1163 [us] to execute
    [_________________________________.term()] took +______982 [us] to execute
    [_____________________= zmq.Context( 10 )] took +_____2646 [us] to execute
    [_________________________________.term()] took +______873 [us] to execute
    [_____________________= zmq.Context( 20 )] took +_____2689 [us] to execute
    [_________________________________.term()] took +______981 [us] to execute
    [_____________________= zmq.Context( 50 )] took +_____2772 [us] to execute
    [_________________________________.term()] took +______910 [us] to execute
    [_____________________= zmq.Context( 99 )] took +_____1086 [us] to execute
    [_________________________________.term()] took +_____1010 [us] to execute