Search code examples
pythonmultiprocessingslicepost-processing

Python - Assign slices to sparse matrix after multiprocessing


I am currently working on a nonlocal method for image denoising and was asked to build a kernel, let's call it w, indexed by each pixel of an image. For context, if an image has NxM pixels, then the kernel (stored as a matrix) will have (NxM)^4 elements. Luckily, the matrix is sparse, symmetric, and its diagonal is known. So I designed a function win_largo(i,l), where i is the index of the i-th row of w and l is the number of elements I want to compute, this returns a CSR sparse row and its transpose.

I have an image of 236^2pixels, so in my case l=1500 has enough information and I can compute the first 1500 rows and columns efficiently. However, there are 54 196 rows left. To compute them, I used the multiprocessing module like this:

import multiprocessing as mp
nm = N*M

pool    = mp.Pool(processes=15)
results = [pool.apply_async(win_largo, args=(i,1500,) ) for i in xrange(1500, nm) ]
pool.close()
pool.join()

I get a pretty quick calculation (couple minutes), that would instead take around 6.45 hours to compute. However, now I need to save this result.

To do so, I wrote the following loop, where w is again stored as a CSR matrix:

j = 0
for i in xrange(1500,nm):
    w[i,i-l:i], w[i-l:i, i] = results[j].get()
    j += 1

But it takes around 6.52 hours to complete. Is there a faster way to do this? Nowadays it seems like the multiprocessing loop is useless.

I am using python 2.7 with macOS High Sierra.


Solution

  • I found a way.

    First of all, slicing isn't the way. By testing the following:

    %%timeit -n 2
    ws  = sparse.csr_matrix((nm, nm), dtype=np.float)
    for i in xrange(5925,6000):                        # 75 iterations
        ws[i,i-l:i], ws[i-l:i, i] = win_largo(i,l)
    

    I got

    2 loops, best of 3: 2.02 s per loop
    

    Whereas, win_largo takes a couple of mili seconds to compute.

    Since slicing isn't the best way to assign the computed objects, I decided to take an advantage of the elements of the sparse slices returned by win_largo. Therefore, I modified the function so that it returns the set of indices in which it is non zero and its corresponding data set:

    %%timeit -n 5
    row_ind, col_ind, datos = [], [], []
    
    for i in xrange(5500,6000):                        # 500 iterations
        indx_w, datos_w = win_largo(i,l)
        row_ind.extend( np.repeat(i, len(indx_w)).tolist() )
        col_ind.extend( np.add(i-l, indx_w).tolist() )
        datos.extend( datos_w.tolist() )
    

    What I got, left me speechless:

    5 loops, best of 3: 2.02 s per loop
    

    Therefore, the whole loop took 3 min and 32 s to complete. WAY less than 6 hours… approximately 0.92% of the original time. That's way a lot.

    After this, I was able to add some extra data to each list, so that the resulting sparse object would be:

    ws = sparse.csr_matrix((datos, (row_ind, col_ind)), [nm, nm])
    

    This just took a couple seconds. In total, I was able to get this matrix in 4 minutes :D