python numpy matrix parallel-processing multiplication

Matrix multiplication with parallel programming in numpy python

I'm new in python, but I need to convert normal matrix multiplication code to parallel with numpy, I need to convert this function to parallel way:


def RowByColumn(A, B):
    matrix=[]
    for i in range(len(A)):
        matrix.append([])
        for j in range(len(B)):
            matrix[i].append(matrix_multiplication(getRow(A,i),getColumn(B,j)))
    return matrix

How can I to apply multiprocessing module or other parallel processing modules in python, could anyone help me how?

Solution

I do not recommend multiprocessing, as processes are slow to create, and also has a big communication overhead. You can use numba, which will compile your function and make it really fast. The library works really well with numpy. Also, it is very easy to include parallelization with threads:

import numpy as np
from numba import njit, prange

@njit(parallel=True)
def mat_mult(A, B):
    assert A.shape[1] == B.shape[0]
    res = np.zeros((A.shape[0], B.shape[1]), )
    for i in prange(A.shape[0]):
        for k in range(A.shape[1]):
            for j in range(B.shape[1]):
                res[i,j] += A[i,k] * B[k,j]
    return res

m, n, c = 1000, 1500, 1200
A = np.random.randint(1, 50, size = (m, n))
B = np.random.randint(1, 50, size = (n, c))

res = mat_mult(A, B)

Use prange to parallelize the loops. I have only parallelized the outer loop, but you can also apply this to the inner loops.

Note also the order of the loops I have used, which makes continuous memory accesses reducing cache misses.

In my laptop, the parallel function takes about 1.4 seconds to be executed, while numpy's matrix multiplication A@B takes 4.4 seconds, so there is some improvement.