Search code examples
cmpiopenmpireduction

MPI_Reduce doesn't work as expected


I am very new to MPI and I'm trying to use MPI_Reduce to find the maximum of an integer array. I have an integer array arr of size arraysize, and here is my code:

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &number_of_processes);
MPI_Comm_rank(MPI_COMM_WORLD, &my_process_id);
MPI_Bcast(arr, arraysize, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Reduce(arr, &result, arraysize, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);

if(!my_process_id){
    printf("%d", result);
}

MPI_Finalize();

My program compiles and runs on 8 processes with no problems, however, nothing gets printed on the screen. For debugging purposes, I change the conditional to if(my_process_id) (without !) and run. Then I get a weird output such as 00030000 where 3 could be anywhere in this list nondeterministically. 3 is my array's first value (but not the maximum). I know about parallel programming in general (not expert, but I usually know what I am doing) but I am very new to MPI to I may be doing an obvious mistake. All the tutorials that I've seen online have code samples similar to mine, and I have no idea what I'm doing wrong.

Thanks,

Can.


Solution

  • MPI_Reduce works exactly the way it is supposed to work. You are the one who does not use it the way it is supposed to be used.

    MPI_Reduce performs element-wise reduction of data, distributed among the ranks in the MPI job. Both the source and the destination buffers should be arrays of size arraysize, e.g.:

    int arr[arraysize];
    int result[arraysize];
    
    // Fill local arr with meaningful data
    ...
    // Perform reduction
    MPI_Reduce(arr, result, arraysize, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);
    

    What MPI_Reduce does is the following:

    result[0] = max(arr_0[0], arr_1[0], ..., arr_(N-1)[0]);
    result[1] = max(arr_0[1], arr_1[1], ..., arr_(N-1)[1]);
    ...
    result[arraysize-1] = max(arr_0[arraysize-1], ..., arr_(N-1)[arraysize-1]);
    

    where arr_0 is the copy of arr in rank 0, arr_1 is the copy of arr in rank 1, and so on.

    The combination of MPI_Bcast, followed by a reduction with MPI_MAX does aboslutely nothing since all copies of arr would have the same values after the broadcast and the aplication of an element-wise max reduction would simply yield the same values. Worse, I would presume that result in your code is a scalar integer variable, hence MPI_Reduce would overwrite arraysize-1 elements past result and would most likely destroy the stack frame, overwrite the value of my_process_id in rank 0 so it would not be 0 anymore (hence nothing is printed) and crashing rank 0 afterwards. Of course, it all depends on how the local variables are aranged in the stack - the implications might not be as severe as I have described them.

    If you'd like to find the maximum value of an array, you should first distribute it using MPI_Scatter, then use MPI_Reduce to perform element-wise reduction and then perform another reduction on the result:

    int elements_per_proc = arraysize/number_of_processes;
    int arr[arraysize];
    int subarr[elements_per_proc];
    int partres[elements_per_proc];
    
    // Distribute the array
    MPI_Scatter(arr, elements_per_proc, MPI_INT,
                subarr, elements_per_proc, MPI_INT, 0, MPI_COMM_WORLD);
    
    // Perform element-wise max reduction
    MPI_Reduce(subarr, partres, elements_per_proc, MPI_INT,
               MPI_MAX, 0, MPI_COMM_WORLD);
    
    // Take the highest of the partial max values
    result = partres[0];
    for (int i = 1; i < elements_per_proc; i++)
       if (partres[i] > result) result = partres[i];
    

    Now you have the value of the maximum element in result.

    Or even better:

    int localmax;
    
    // Distribute the array
    MPI_Scatter(arr, elements_per_proc, MPI_INT,
                subarr, elements_per_proc, MPI_INT, 0, MPI_COMM_WORLD);
    
    // Find the maximum element of the local subarray
    localmax = subarr[0];
    for (int i = 1; i < elements_per_proc; i++)
       if (subarr[i] > localmax) localmax = subarr[i];
    
    // Perform global max reduction
    MPI_Reduce(&localmax, &result, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);