I have a matrix array of 3D brain images which I am doing some processing for these images.
The input matrix looks like M[X, Y]: where X is the brain id and Y is the data which I am reshape it later to make some enhancement for
The following sequential code do it perfectly:
def transform(X):
data = np.reshape(X, (-1, 176, 208, 176))
data_cropped = np.empty((data.shape[0], 90, 100, 70))
for idx in range(0, data.shape[0]):
data_cropped[idx, :, :, :] = data[idx, 40:130, 40:140, 50:120]
data_cropped = perm(data_cropped)
#data_cropped = impute_data(data_cropped)
data_cropped = np.reshape(data_cropped, (data_cropped.shape[0], -1))
#data_cropped = data_cropped[:, np.apply_along_axis(np.count_nonzero, 0, data_cropped) != 0]
return data_cropped
X_train = np.load("./data_original/X_train.npy")
X_crop = transform(X_train)
The output of this code portion when running sequentially (normal for loop) is:
brain: 0
brain: 1
brain: 2
brain: 3
...
The problem is that it takes very long time (around 60 min) to process all the brains.
I was trying to make the code running in parallel but I am unable to process all brains! Only brain 0 is being processed multiple times.
There is my try to parallelize the code:
num_cores = multiprocessing.cpu_count()
X_train = np.load("./data_original/X_train.npy")
X_crop = Parallel(n_jobs=num_cores)(delayed(transform)(i) for i in X_train)
But I got this result:
brain: 0
brain: 0
brain: 0
brain: 0
...
Any idea how to solve this problem? Thanks
You need to
for i in X_train
produces rows of X_train
(along the first dimension), one at a time, and they have one dimension less than the initial array:
In [7]: a=np.random.random((2,10))
In [10]: a.shape
Out[10]: (2, 10)
In [11]: [i.shape for i in a]
Out[11]: [(10,), (10,)]
Since you didn't give all the sample code to reproduce the issue, I cannot say what shape your processing code expects.
Then, apparently, the number after "brain:" is the index of a row in an input. If you feed each job a part of the input array, naturally, they will all produce the same indices. You need to somehow tell each job its staring index so that they calculate absolute indices correctly.