Search code examples
pythonmultiprocessingcluster-computingpbsmpi4py

pbs cluster - python multiple simulations


I have to run multiple simulations of the same model with varying parameters (or random number generator seed). Previously I worked on a server with many cores, where I used python multiprocessing library with apply_async. This was very handy as I could decide the maximum number of cores to occupy and simulations would just go into a queue.

Now I moved to a place with a hpc cluster working with pbs. It seems from trial and error and different answers that multiprocessing works only inside one node. Is there a way to make it work on many nodes, or any other library which reaches the same funtionality with the same easyness to use in few lines?

To let you understand my kind of code:

import functions_library as L
import multiprocessing as mp
if __name__ == "__main__":

    N = 100

    proc = 50
    pool = mp.Pool(processes = proc)



    seed = 342
    np.random.seed(seed)

    seeds = np.random.randint(low=1,high=100000,size=N)

    resul = []
    for SEED in seeds:

        SEED = int(SEED)

        resul.append(pool.apply_async(L.some_function, args = (some_args)))
        print(SEED)

    results = [p.get() for p in resul]

    database = pd.DataFrame(results)


    database.to_csv("prova.csv")

EDIT

As I understand, mpi4py might be helpful as naturally interacts with pbs. Is that correct? How can I adapt my code to mpi4py?


Solution

  • I have found that the schwimmbad package is quite handy to run code written for multiprocessing in an MPI cluster with minimal changes.

    I hope it helps!