Let`s assume that we created a numpy array with views on another array using stride tricks:
import numpy as np
from numpy.lib import stride_tricks
x = np.arange(20).reshape([4, 5])
arr = stride_tricks.as_strided(x, shape=(3, 2, 5),strides=(20, 20, 4))
We can confirm that this new array is indeed a view:
assert not arr.flags['OWNDATA']
# True
Question:
If I pass arr
as an argument into multiprocessing.Process()
will arr
be copied into each process ? Will x
be copied ? Please explain why.
If the sharing is via pickle
serialization, then clearly the view
(how ever generated) will produce a copy:
In [298]: x = np.arange(10)
In [299]: y = x.reshape(2,5)
In [300]: import pickle
In [301]: B = pickle.dumps(y)
In [302]: Y = pickle.loads(B)
In [303]: Y
Out[303]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [304]: y.__array_interface__['data']
Out[304]: (43176224, False)
In [305]: x.__array_interface__['data']
Out[305]: (43176224, False)
In [306]: Y.__array_interface__['data']
Out[306]: (59035584, False)
For what it's worth the pickle
of a numpy array is actually performed by np.save
.
Passing x
and making the view in each process might be better.