It appears that I can call q.put
1000 times in under 2.5ms. How is that possible when just pickling that very same array 1000 times takes over 2 seconds?
>>> a = np.random.rand(1024,1024)
>>> q = Queue()
>>> timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
>>> timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327
Obviously, I am not understanding something about how Queue.put
works. Can anyone enlighten me?
I also observed the following:
>>> def f():
... q.put(a)
... q.get()
>>> timeit.timeit(lambda: f(), number=1000)
42.33058542700019
This appears to be more realistic and suggests to me that simply calling q.put()
will return before the object is actually serialized. Is that correct?
The multiprocessing
implementation has a number of moving parts under the covers. Here, dealing with a multiprocessing.Queue
is mostly done in a hidden (to the end user) worker thread. .put()
just puts an object pointer on a queue (fast and constant-time), and that worker thread does the actual pickling, when it gets around to it.
This can burn you, though: if, in your example, the main program goes on to mutate the np
array, after the .put()
, an undefined number of those mutations may be captured by the eventually pickled state. The user-level .put()
only captures the object pointer, nothing about the object's state.