I am currently working on a nonlocal method for image denoising and was asked to build a kernel, let's call it w
, indexed by each pixel of an image. For context, if an image has NxM
pixels, then the kernel (stored as a matrix) will have (NxM)^4
elements. Luckily, the matrix is sparse, symmetric, and its diagonal is known. So I designed a function win_largo(i,l)
, where i
is the index of the i-th row of w
and l
is the number of elements I want to compute, this returns a CSR sparse row and its transpose.
I have an image of 236^2
pixels, so in my case l=1500
has enough information and I can compute the first 1500
rows and columns efficiently. However, there are 54 196
rows left. To compute them, I used the multiprocessing module like this:
import multiprocessing as mp
nm = N*M
pool = mp.Pool(processes=15)
results = [pool.apply_async(win_largo, args=(i,1500,) ) for i in xrange(1500, nm) ]
pool.close()
pool.join()
I get a pretty quick calculation (couple minutes), that would instead take around 6.45 hours to compute. However, now I need to save this result.
To do so, I wrote the following loop, where w is again stored as a CSR matrix:
j = 0
for i in xrange(1500,nm):
w[i,i-l:i], w[i-l:i, i] = results[j].get()
j += 1
But it takes around 6.52 hours to complete. Is there a faster way to do this? Nowadays it seems like the multiprocessing loop is useless.
I am using python 2.7 with macOS High Sierra.
I found a way.
First of all, slicing isn't the way. By testing the following:
%%timeit -n 2
ws = sparse.csr_matrix((nm, nm), dtype=np.float)
for i in xrange(5925,6000): # 75 iterations
ws[i,i-l:i], ws[i-l:i, i] = win_largo(i,l)
I got
2 loops, best of 3: 2.02 s per loop
Whereas, win_largo
takes a couple of mili seconds to compute.
Since slicing isn't the best way to assign the computed objects, I decided to take an advantage of the elements of the sparse slices returned by win_largo
. Therefore, I modified the function so that it returns the set of indices in which it is non zero and its corresponding data set:
%%timeit -n 5
row_ind, col_ind, datos = [], [], []
for i in xrange(5500,6000): # 500 iterations
indx_w, datos_w = win_largo(i,l)
row_ind.extend( np.repeat(i, len(indx_w)).tolist() )
col_ind.extend( np.add(i-l, indx_w).tolist() )
datos.extend( datos_w.tolist() )
What I got, left me speechless:
5 loops, best of 3: 2.02 s per loop
Therefore, the whole loop took 3 min and 32 s to complete. WAY less than 6 hours… approximately 0.92% of the original time. That's way a lot.
After this, I was able to add some extra data to each list, so that the resulting sparse object would be:
ws = sparse.csr_matrix((datos, (row_ind, col_ind)), [nm, nm])
This just took a couple seconds. In total, I was able to get this matrix in 4 minutes :D