I have a program which is designed to be highly parallelizable. I suspect that some processors are finishing this Python script sooner then other processors, which would explain behavior I observe upstream of this code. Is it possible that this code allows some mpi processes to finish sooner than others?
dacout = 'output_file.out'
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nam ='lcoe.coe'
csize = 10000
with open(dacout) as f:
for i,l in enumerate(f):
pass
numlines = i
dakchunks = pd.read_csv(dacout, skiprows=0, chunksize = csize, sep='there_are_no_seperators')
linespassed = 0
vals = {}
for dchunk in dakchunks:
for line in dchunk.values:
linespassed += 1
if linespassed < 49 or linespassed > numlines - 50: continue
else:
split_line = ''.join(str(s) for s in line).split()
if len(split_line)==2:
if split_line[0] == 'nan' or split_line[0] == '-nan': continue
if split_line[1] != nam: continue
if split_line[1] not in vals:
try: vals[split_line[1]] = [float(split_line[0])]
except NameError: continue
else:vals[split_line[1]].append(float(split_line[0]))
# Calculate mean and x s.t. Percentile_x(coe_dat)<threshold_coe
self.coe_vals = sorted(vals[nam])
self.mean_coe = np.mean(self.coe_vals)
self.p90 = np.percentile(self.coe_vals, 90)
self.p95 = np.percentile(self.coe_vals, 95)
count_vals = 0.00
for i in self.coe_vals:
count_vals += 1
if i > coe_threshold: break
self.perc = 100 * (count_vals/len(self.coe_vals))
if rank==0:
print>>logf, self.rp, self.rd, self.hh, self.mean_coe
print self.rp, self.rd, self.hh, self.mean_coe, self.p90, self.perc
In the code you posted, all processes are reading the same file and compute the same thing. But the only process printing the result is process 0. This is not parallel computing, this is doing the same thing multiple times!
Some processes can finish this script before others since the script does not end by a barrier. Use comm.barrier()
to synchronize all processes of the communicator comm
. Do it only if it is necessary: barriers can harm performances...