How are MPI_Scatter and MPI_Gather used from C?

So far, my application is reading in a txt file with a list of integers. These integers needs to be stored in an array by the master process i.e. processor with rank 0. This is working fine.

Now, when I run the program I have an if statement checking whether it's the master process and if it is, I'm executing the MPI_Scatter command.

From what I understand this will subdivide the array with the numbers and pass it out to the slave processes i.e. all with rank > 0 . However, I'm not sure how to handle the MPI_Scatter. How does the slave process "subscribe" to get the sub-array? How can I tell the non-master processes to do something with the sub-array?

Can someone please provide a simple example to show me how the master process sends out elements from the array and then have the slaves add the sum and return this to the master, which adds all the sums together and prints it out?

My code so far:

#include <stdio.h>
#include <mpi.h>

//A pointer to the file to read in.
FILE *fr;

int main(int argc, char *argv[]) {

int rank,size,n,number_read;
char line[80];
int numbers[30];
int buffer[30];

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

fr = fopen ("int_data.txt","rt"); //We open the file to be read.

if(rank ==0){
printf("my rank = %d\n",rank);

//Reads in the flat file of integers  and stores it in the array 'numbers' of type int.
while(fgets(line,80,fr) != NULL) {
  sscanf(line, "%d", &number_read);
  numbers[n] = number_read;
  printf("I am processor no. %d --> At element %d we have number: %d\n",rank,n,numbers[n]);



else {
MPI_Gather ( &buffer, 2, MPI_INT, &numbers, 2, MPI_INT, 0, MPI_COMM_WORLD); 
return 0;


  • This is a common misunderstanding of how operations work in MPI with people new to it; particularly with collective operations, where people try to start using broadcast (MPI_Bcast) just from rank 0, expecting the call to somehow "push" the data to the other processors. But that's not really how MPI routines work; most MPI communication requires both the sender and the receiver to make MPI calls.

    In particular, MPI_Scatter() and MPI_Gather() (and MPI_Bcast, and many others) are collective operations; they have to be called by all of the tasks in the communicator. All processors in the communicator make the same call, and the operation is performed. (That's why scatter and gather both require as one of the parameters the "root" process, where all the data goes to / comes from). By doing it this way, the MPI implementation has a lot of scope to optimize the communication patterns.

    So here's a simple example (Updated to include gather):

    #include <mpi.h>
    #include <stdio.h>
    #include <stdlib.h>
    int main(int argc, char **argv) {
        int size, rank;
        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        int *globaldata=NULL;
        int localdata;
        if (rank == 0) {
            globaldata = malloc(size * sizeof(int) );
            for (int i=0; i<size; i++)
                globaldata[i] = 2*i+1;
            printf("Processor %d has data: ", rank);
            for (int i=0; i<size; i++)
                printf("%d ", globaldata[i]);
        MPI_Scatter(globaldata, 1, MPI_INT, &localdata, 1, MPI_INT, 0, MPI_COMM_WORLD);
        printf("Processor %d has data %d\n", rank, localdata);
        localdata *= 2;
        printf("Processor %d doubling the data, now has %d\n", rank, localdata);
        MPI_Gather(&localdata, 1, MPI_INT, globaldata, 1, MPI_INT, 0, MPI_COMM_WORLD);
        if (rank == 0) {
            printf("Processor %d has data: ", rank);
            for (int i=0; i<size; i++)
                printf("%d ", globaldata[i]);
        if (rank == 0)
        return 0;

    Running it gives:

    gpc-f103n084-$ mpicc -o scatter-gather scatter-gather.c -std=c99
    gpc-f103n084-$ mpirun -np 4 ./scatter-gather
    Processor 0 has data: 1 3 5 7 
    Processor 0 has data 1
    Processor 0 doubling the data, now has 2
    Processor 3 has data 7
    Processor 3 doubling the data, now has 14
    Processor 2 has data 5
    Processor 2 doubling the data, now has 10
    Processor 1 has data 3
    Processor 1 doubling the data, now has 6
    Processor 0 has data: 2 6 10 14