Search code examples
pythonmpihpcmpi4pymultiple-processes

Is it possible that some processes in this program finish sooner than others?


I have a program which is designed to be highly parallelizable. I suspect that some processors are finishing this Python script sooner then other processors, which would explain behavior I observe upstream of this code. Is it possible that this code allows some mpi processes to finish sooner than others?

dacout = 'output_file.out'
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nam ='lcoe.coe'
csize = 10000
with open(dacout) as f:
    for i,l in enumerate(f):
        pass
numlines = i
dakchunks = pd.read_csv(dacout,  skiprows=0, chunksize = csize, sep='there_are_no_seperators')
linespassed = 0
vals = {}
for dchunk in dakchunks:
    for line in dchunk.values:
        linespassed += 1
        if linespassed < 49 or linespassed > numlines - 50: continue
        else:
            split_line = ''.join(str(s) for s in line).split()
        if len(split_line)==2:
              if split_line[0] == 'nan' or split_line[0] == '-nan': continue

              if split_line[1] != nam: continue
              if split_line[1] not in vals:
                  try: vals[split_line[1]] = [float(split_line[0])]
                  except NameError: continue
              else:vals[split_line[1]].append(float(split_line[0]))
# Calculate mean and x s.t. Percentile_x(coe_dat)<threshold_coe
self.coe_vals = sorted(vals[nam])
self.mean_coe = np.mean(self.coe_vals)
self.p90 = np.percentile(self.coe_vals, 90)
self.p95 = np.percentile(self.coe_vals, 95)

count_vals = 0.00
for i in self.coe_vals:
    count_vals += 1
    if i > coe_threshold: break
self.perc = 100 * (count_vals/len(self.coe_vals))
if rank==0:
    print>>logf, self.rp, self.rd, self.hh, self.mean_coe
    print self.rp, self.rd, self.hh, self.mean_coe, self.p90, self.perc

Solution

  • In the code you posted, all processes are reading the same file and compute the same thing. But the only process printing the result is process 0. This is not parallel computing, this is doing the same thing multiple times!

    Some processes can finish this script before others since the script does not end by a barrier. Use comm.barrier() to synchronize all processes of the communicator comm. Do it only if it is necessary: barriers can harm performances...