python python-3.x python-multiprocessing

Python - Multiprocessing with multiple for-loops

I know there are other questions asked concerning this topic so I'm sorry I have to ask it again, but I cannot get it to work since I'm quite new to this topic.

I have four for-loop (nested) in which certain algbraic calculations are done (matrix operations for example). These calculations take too much time to complete, so I was hoping I could speed this up with Multiprocessing.

The code is given below. I simulated the ranges and matrix sizes here, but in my code these ranges are really used (so it's not strange that it takes so long). You should be able to run it directly when copy-paste the code.

import numpy as np
from scipy.linalg import fractional_matrix_power
import math

#Lists for the loop (and one value)
x_list = np.arange(0, 32, 1)
y_list = np.arange(0, 32, 1)
a_list = np.arange(0, 501, 1)
b_list = np.arange(0, 501, 1)
c_list = np.arange(0, 64, 1)
d_number = 32

#Matrices
Y = np.arange(2048).reshape(32, 64)
g = np.asmatrix(np.empty([d_number, 1], dtype=np.complex_))
A = np.empty([len(a_list), len(b_list), len(c_list)], dtype=np.complex_)
A_in = np.empty([len(a_list), len(b_list)], dtype=np.complex_)


for ai in range(len(a_list)):
    
    for bi in range(len(b_list)):
        
        for ci in range(len(c_list)):
            
            f_k_i = c_list[ci]
            X_i = np.asmatrix([Y[:, ci]]).T
            
            for di in range(d_number):
                
                r = math.sqrt((x_list[di] - a_list[ai])**2 + (y_list[di] - b_list[bi])**2 + 63**2)
                g[di, 0] = np.exp(-2 * np.pi * 1j * f_k_i * (r / 8)) / r #g is a vector
            
            A[-bi, -ai, ci] = ((1 / np.linalg.norm(g)**2) * (((g.conj().T * fractional_matrix_power((X_i * X_i.conj().T), (1/5)) * g) / np.linalg.norm(g)**2)**2)).item(0)
         
        A_in[-bi, -ai] = (1 / len(c_list)) * sum(A[-bi, -ai, :])

What is the best way to approach this? If multiprocessing is the solution, how to implement this for my case (since I couldn't figure that out).

Thanks in advance.

Solution

One way to approach it would be to move the two inside loops into a function taking ai and bi as parameters and returning the indexes and the result. Then use multiprocessing.Pool.imap_unordered() to run the function on ai, bi pairs. Something like this (untested):

def partial_calc(index):
    """
    This function replaces the inner two loops to calculate the value of
    A_in[-bi, -ai]. index is a tuple (ai, bi).
    """

    ai, bi = index

    for ci in range(len(c_list)):
        
        f_k_i = c_list[ci]
        X_i = np.asmatrix([Y[:, ci]]).T
        
        for di in range(d_number):
            
            r = math.sqrt((x_list[di] - a_list[ai])**2 + (y_list[di] - b_list[bi])**2 + 63**2)
            g[di, 0] = np.exp(-2 * np.pi * 1j * f_k_i * (r / 8)) / r #g is a vector
        
        A[-bi, -ai, ci] = ((1 / np.linalg.norm(g)**2) * (((g.conj().T * fractional_matrix_power((X_i * X_i.conj().T), (1/5)) * g) / np.linalg.norm(g)**2)**2)).item(0)
        
    return ai, bi, (1 / len(c_list)) * sum(A[-bi, -ai, :])
    

def main():

    with multiprocessing.Pool(None) as p:
        # this replaces the outer two loops
        indices = itertools.product(range(len(a_list)), range(len(b_list)))

        partial_results = p.imap_unordered(partial_calc, indices)        
         
        for ai, bi, value in partial_results:
            A_in[-bi, -ai] = value

    #... do something with A_in ...

if __name__ == "__main__":
    main()

Or put the inner three loops into the function and generate one "row" for A_in at a time. Profile it both ways and see which is faster.

The trick will be setting up the lists (a_list, b_list, etc) and the Y matrix. And that depends on their characteristics (constant, quickly/slowly calculated, large/small, etc).