Search code examples
mpimpi-io

Is it possible to write with several processors in the same file, at the end of the file, in an ordonated way?


I have 2 processors (this is an example), and I want these 2 processors to write in a file. I want them to write at the end of file, but not in a mixed pattern, like that :

[file content]
proc0
proc1
proc0
proc1
proc0
proc1
(and so on..)

I'd like to make them write following this kind of pattern :

[file content]
proc0
proc0
proc0
proc1
proc1
proc1
(and so on..)

Is it possible? If so, what's the setting to use?


Solution

  • The sequence in which your processes have outputs ready to report is, essentially, unknowable in advance. Even repeated runs of exactly the same MPI program will show differences in the ordering of outputs. So something, somewhere, is going to have to impose an ordering on the writes to the file.

    A very common pattern, the one Wesley has already mentioned, is to have all processes send their outputs to one process, often process 0, and let it deal with the writing to file. This master-writer could sort the outputs before writing but this creates a couple of problems: allocating space to store output before writing it and, more difficult to deal with, determining when a collection of output records can be sorted and written to file and the output buffers be reused. How long does the master-writer wait and how does it know if a process is still working ?

    So it's common to have the master-writer write outputs as it gets them and for another program to order the output file as desired after the parallel program has finished. You could tack this on to your parallel program as a step after mpi_finalize or you could use a completely separate program (such as sort on a Linux machine). Of course, for this to work each output record has to contain some sequencing information on which to sort.

    Another common pattern is to only have one process which does any writing at all, that is, none of the other processes do any output at all. This completely avoids the non-determinism of the sequencing of the writing.

    Another pattern, less common partly because it is more difficult to implement and partly because it depends on underlying mechanisms which are not always available, is to use mpi io. With mpi io multiple processes can write to different parts of a file as if simultaneously. To actually write simultaneously the program needs to be executing on hardware, network and operating system which supports parallel i/o. It can be tricky to implement this even with the right platform, and especially when the volume of output from processes is uncertain.

    In my experience here on SO people asking question such as yours are probably at too early a stage in their MPI experience to be tackling parallel i/o, even if they have access to the necessary hardware.