Is there any way in MPI_Programs to order the execution of processes?

Say I have 2 processes, P1 and P2 and both P1 and P2 are printing an array of 1000 data points. As we know, we can't guarantee anything about the order of output, it may be P1 prints the data first followed by P2 or vice versa, or it can be that both outputs are getting mixed. Now say I want to output the values of P1 first followed by P2. Is there any way by which I can guarantee that?

I am attaching a Minimal Reproducible Example in which output gets mixed herewith

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"



int main( int argc, char *argv[])
{

    MPI_Init(&argc, &argv);
    
     int myrank, size; //size will take care of number of processes 
     
      MPI_Comm_rank(MPI_COMM_WORLD, &myrank) ;
      MPI_Comm_size(MPI_COMM_WORLD, &size);
     
     if(myrank==0)
     {
     
        
     
        int a[1000];
        
        
        for(int i=0;i<1000;i++)
        {
            a[i]=i+1;
        }
        
        for(int i=0;i<1000;i++)
        {
            printf(" %d",a[i]);
        }
        
        
     }
     
     if(myrank==1)
     {
     
        int a[1000];
        
        
        for(int i=0;i<1000;i++)
        {
            a[i]=i+1;
        }
        
        for(int i=0;i<1000;i++)
        {
            printf(" %d",a[i]);
        }
        
        
     }
         
    
        MPI_Finalize();
        return 0;   

}

The only way I can think of to output the data sequentially is that sending the data from say P1 to P0 and then printing it all from P0. But then we will incur the extra computational cost of sending data from one process to another.

Solution

Now say I want to output the values of P1 first followed by P2. Is there any way by which I can guarantee that?

This is not how MPI is meant to be used, actually parallelism in general IMO. The coordination of printing the output to the console among processes will greatly degrade the performance of the parallel version, which defeats one of the purposes of parallelism i.e., reducing the overall execution time.

Most of the times one is better off just making one process responsible for printing the output to the console (typically the master process i.e., process with rank = 0).

Citing @Gilles Gouaillardet:

The only safe option is to send all the data to a given rank, and then print the data from that rank.

You could try using MPI_Barrier to coordinate the processes in a way that would print the output has you want, however (citing @Hristo Iliev):

Using barriers like that only works for local launches when (and if) the processes share the same controlling terminal. Otherwise, it is entirely to the discretion of the I/O redirection mechanism of the MPI implementation.

If it is for debugging purposes you can either use a good MPI-aware debugger that allows to look into the content of the data of each process. Alternatively, you can limiting the output to be printed at one process at the time per run so that you can check if all the processes have the data that they should have.