Why does multiprocessing.Queue.put() seem faster at pickling a numpy array than actual pickle?

It appears that I can call q.put 1000 times in under 2.5ms. How is that possible when just pickling that very same array 1000 times takes over 2 seconds?

>>> a = np.random.rand(1024,1024)
>>> q = Queue()
>>> timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
>>> timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327

Obviously, I am not understanding something about how Queue.put works. Can anyone enlighten me?

I also observed the following:

>>> def f():
...   q.put(a)
...   q.get()

>>> timeit.timeit(lambda: f(), number=1000)
42.33058542700019

This appears to be more realistic and suggests to me that simply calling q.put() will return before the object is actually serialized. Is that correct?

Solution

The multiprocessing implementation has a number of moving parts under the covers. Here, dealing with a multiprocessing.Queue is mostly done in a hidden (to the end user) worker thread. .put() just puts an object pointer on a queue (fast and constant-time), and that worker thread does the actual pickling, when it gets around to it.

This can burn you, though: if, in your example, the main program goes on to mutate the np array, after the .put(), an undefined number of those mutations may be captured by the eventually pickled state. The user-level .put() only captures the object pointer, nothing about the object's state.