parallel-processing mpi cluster-computing multicore mpi4py

Parallelization of a loop which involves random sampling using mpi4py

I am new to parallelization and MPI. I am learning and experimenting with mpi4py. Currently I am trying to optimize the performance of a method which randomly samples for a desired point(satisfying a condition) in an interval.

To give you a detailed idea, i created a sample program which is similar to my program. The aim of this program is to output 20 numbers between 9.9999 and 10.0. This is done by randomly sampling from [0.0,1.0] and multiplying it by 10(varies by iteration by a infinitesimally small amount).

The following is the function and comments are provided.

import numpy as np
import time

def fit(iterations, value):
        array = []
        sample = None
        # This runs for a fixed number of iterations. For one iteration one sample needs to go to the accumulator array (in this case i.e array)
        for iteration in range(iterations):
            while True:
                arbit = np.random.uniform(0,1)
                # The main condition for sampling. In my original program this is bound to change with each
                # iteration. so I made it depend on the iteration in this test program.
                if((10-0.000001*(iteration))*arbit > value):
                    sample = 10*arbit
                    break
            # The accumulation of accepted sample. If there are n iterations, n samples are expected. 
            array.append(sample)

        print "Result: "+ str(array)

if __name__ == '__main__':
            startTime = time.time()
            fit(20, 9.9999)
            elapsedTime = time.time() - startTime
            print "\n"+"time taken: "+str(elapsedTime)+"\n"

As you can see all the action happens in the while loop in the fit() method. What I want to do is to take advantage of parallelization and mpi4py to make this method faster. For example, I want to start n processes and when the while loop comes the processes are fired parallely and which ever one finds the desired value first I want to take that value add it to accumulator array and abort all other processes. I want to continue this routine again in the next iteration and so on until the method finishes. Is something like this possible ? If not this way, What other way can I use parallelization in the above function ?

Thanks

Solution

The general ideas behind parallelization are heavily application-dependent. The more independent you can make your processes, the better. Inter-process communication adds hassle and overhead. This is especially true if your processes reside in different computers.

With your sample code above the simple way to make it parallel would be to split it by iterations. You would have a list of iterations and a number of worker processes which would churn through one iteration cycle at a time. Even if you needed to have the results in order, you can sort them afterwards. So, it does not really matter, if you go through iterations 0, 1, 2, 3... or e.g. 17, 3, 14, 1, 5...

But what you seem to suggest is that you split each iteration cycle into parallel loops looking for a suitable number. That is possible (but make sure you use different random seeds in different processes, otherwise they are replicating the same sequence), and the communication needed is very simple:

worker processes need to be able to send "I found it!"
worker processes need to stop when another process sends "I found it!"
worker processes need to be able to fetch a new starting value after they are done

There are several ways to accomplish this. The description above assumes the workers are active, but it is often easier to make stupid workers which only indicate when they are done and start doing things when they are told to. In that case you only need point-to-point communication between the master and the slaves.

In any case the workers have to poll the communication regularly when they are doing their work, and from the performance point of view the polling interval is important. If you poll very frequently, you lose time polling. If your poll interval is very long, then you lose time when something happens. The situation is easier with the master which can use blocking communication and just sit and wait until the workers say something.

So, you may use broadcasts between workers, or you may use master-slave communication, or you may use a combination of these. There are pros and cons in each approach, and the optimal solution depends on your application and requirements. (I usually pick the solution which is simplest to write and optimize only if there is a clear need.)

I am only superficially familiar with MPI, but the answer to your question is "yes", it can be done with MPI.