Search code examples
pythonnumpyslurmlapackintel-mkl

numpy matrix mult does not work when it is parallized on HPC


I have two dense matrices with the sizes (2500, 208) and (208, 2500). I want to calculate their product. It works fine and fast when it is a single process but when it is in a multiprocessing block, the processes stuck in there for hours. I do sparse matrices multiplication with even larger sizes but I have no problem. My code looks like this:

with Pool(processes=agents) as pool:
    result = pool.starmap(run_func, args)
def run_func(args):
    #Do stuff. Including large sparse matrices multiplication. 
    C = np.matmul(A,B) # or A.dot(B) or even using BLASS library directly dgemm(1, A, B)
    #Never go after the line above!

Note that when the function run_func is executed in a single process, then it works fine. When I do multiprocessing on my local machine, it works fine. When I go for a multiprocessing on HPC, it stucks. I allocate my resources like this:

srun -v --nodes=1 --time 7-0:0 --cpus-per-task=2 --nodes=1 --mem-per-cpu=20G python3 -u run.py 2

Where the last parameter is the number of agents in the code above. Here is the LAPACK library details supported on the HPC (obtained from numpy):

    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']

Compared to my local machine, all python packages and python version on HPC are the same. Any leads on what is going on?


Solution

  • As a workaround, I tried multithreading instead of multiprocessing and the issue is resolved now. I am not sure what the problem behind multiprocessing though.