I have some setup where all processes get a consecutive chunk of work, and I want to save all the output together at the end as a single file, like the following:
int start_ind = split_work(mpi_rank, mpi_size), end_ind = split_work(mpi_rank+1, mpi_size);
vector<double> results;
for(int i=start_ind; i<end_ind; i++){
results[i] = do_work(i);
}
MPI_File handler;
MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &handler);
MPI_Status status;
MPI_File_write_at(handler, start_ind*sizeof(double), results.data()+start_ind,
end_ind - start_ind, MPI_DOUBLE, &status);
MPI_File_close(&handler);
However, sometimes the work is not well-balanced, and half the processes could finish their work hours before the other half. As far as I can tell, those processes proceeds to spin using 100% of a CPU for hours until all processes reach MPI_File_open
. This is obviously not desirable. What is the best practice for such a case if I want the output to end up as one single file?
I found an answer in this question that solves my problem. With OpenMPI, running
mpirun -np N --mca mpi_yield_when_idle 1 ./a.out
yields any process that is waiting on a blocking action. This comes at the cost of increased latency for cross-process communication, but that's not a problem for my use case which doesn't send any messages while performing computation.