Search code examples
cmpimatrix-multiplicationmultiple-processes

MPI master process not waiting for other procs to calculate


Hello I am new to programming with MPI. I'm trying to multiply two matrices together (an NxN matrix (A) and an Nx1 (B) matrix) to get a resulting C matrix (Nx1). Each process is supposed to calculate a row (element) in matrix C, however only process 0 (my master process) calculates correctly, as it does not appear to be waiting for the other processes to finish calculating. I am also unsure if the non-master procs are sending the result back correctly (or if they even need to?). Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"

#define PRINT_VECS 1
#define MAX_RAND 100
#define MASTER 0
#define COLUMNS_B 1
#define N 4

void init_vec(int *vec, int len);
void print_vec(const char *label, int *vec, int len);

void init_vec(int *vec, int len)
{
    int i;
    for (i = 0; i < len; i++)
    {
        vec[i] = rand() % MAX_RAND;
    }    
}

void print_vec(const char *label, int *vec, int len)
{
#if PRINT_VECS
    printf("%s", label);     
    int i;
    for (i = 0; i < len; i++)
    {
        printf("%d ", vec[i]);
    }
    printf("\n\n");
#endif
}

void init_matrix(int** matrix, int rows, int cols)
{
    int i,j;
    for (i = 0; i < rows; i++)
    {
        for (j = 0; j < cols; j++)
        {
            matrix[i][j] = rand() % MAX_RAND;
        }
    }
}

void print_matrix(int** matrix, int rows, int cols)
{
    int i;
    for (i = 0; i < rows; i++)
    {
        printf("|");
        int j;
        for (j = 0; j < cols; j++)
        {
            printf("%d ", matrix[i][j]);
        }
        printf("|\n");
    }
}   


int main(int argc, char *argv[])
{
    int my_rank;
    int num_procs;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); //grab this process's rank
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs); //grab the total num of processes

    int results[num_procs]; // used to store the partial sum computed
    int rows, cols, colsB;
    rows = N;
    cols = N;
    colsB = COLUMNS_B;
    int **A; // N x N Matrix
    int B[N]; // N x 1 Matrix
    int **C; // N x 1 Matrix

    double start_time; // use these for timing
    double stop_time;

    if (my_rank == MASTER)
    {
        printf("Number of processes: %d\n", num_procs);
        printf("N: %d\n", N);
        srand(time(NULL));

        // init A
        int i;
        A = malloc(rows * sizeof *A);
        for (i = 0; i < rows; i++)
        {
            A[i] = malloc(cols * sizeof *A[i]);
        }
        init_matrix(A, rows, cols);
        printf("Matrix A:\n");
        print_matrix(A, rows, cols);

        // init B
        init_vec(B, N);
        print_vec("Matrix B:\n", B, N);

        // init C
        C = malloc(rows * sizeof *C);
        for (i = 0; i < rows; i++)
        {
            C[i] = malloc(colsB * sizeof *C[i]);
        }

        start_time = MPI_Wtime();
    }

    MPI_Bcast(B, N, MPI_INT, 0, MPI_COMM_WORLD);
    //MPI_Bcast(A, N, MPI_INT, 0, MPI_COMM_WORLD);

    int row = my_rank;

    int my_sum = 0;

    int i;
    if (my_rank < N)
    {
        for (i = 0; i < N; i++)
        {
            int num = A[row][i] * B[i];
            my_sum = my_sum + num;
        }



    C[row] = &my_sum;
    printf("HAI FROM PROCESS %d! I will calculate row %d. My calculation: %d\n", my_rank, row, my_sum);
}

//MPI_Gather(&C, 1, MPI_INT, results, 1, MPI_INT, 0, MPI_COMM_WORLD);

if (my_rank == MASTER)
{
    stop_time = MPI_Wtime();
    printf("\nMatrix C:\n");
    print_matrix(C, rows, colsB);
    printf("Total time (sec): %f\n", stop_time - start_time);
}

MPI_Finalize();

return EXIT_SUCCESS;;

}

I'm pretty sure I'm close but I'm just missing something. I've tried adding some of those commented out statements with broadcasting the A matrix also, and/or calling MPI_GATHER, but nothing seems to be giving results from any procs other than the master proc, so clearly i'm still doing something wrong. Here is some sample output:

Number of processes: 28
N: 4
Matrix A:
|11 30 69 24 |
|83 38 66 71 |
|68 71 27 33 |
|58 5 50 10 |
Matrix B:
1 58 81 44

HAI FROM PROCESS 0! I will calculate row 0. My calculation: 8396

Matrix C:
|8396 |
|-2107258888 |
|-2107258920 |
|-2107258888 |
Total time (sec): 0.000078

So proc 0 is calculating correctly, however my error message is that proc 1 is getting a seg fault and I can't figure out why. The error I'm getting is: mpirun noticed that process rank 1 with PID 0 exited on signal 11 (Segmentation fault). Any help would be greatly appreciated!


Solution

  • This is your program with fixed issues:

    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    #include "mpi.h"
    
    #define PRINT_VECS 1
    #define MAX_RAND 100
    #define MASTER 0
    #define COLUMNS_B 1
    #define N 4
    
    void init_vec(int *vec, int len);
    void print_vec(const char *label, int *vec, int len);
    
    void init_vec(int *vec, int len)
    {
        int i;
        for (i = 0; i < len; i++)
        {
            vec[i] = rand() % MAX_RAND;
        }
    }
    
    void print_vec(const char *label, int *vec, int len)
    {
    #if PRINT_VECS
        printf("%s", label);
        int i;
        for (i = 0; i < len; i++)
        {
            printf("%d ", vec[i]);
        }
        printf("\n\n");
    #endif
    }
    
    void init_matrix(int** matrix, int rows, int cols)
    {
        int i,j;
        for (i = 0; i < rows; i++)
        {
            for (j = 0; j < cols; j++)
            {
                matrix[i][j] = rand() % MAX_RAND;
            }
        }
    }
    
    void print_matrix(int** matrix, int rows, int cols)
    {
        int i;
        for (i = 0; i < rows; i++)
        {
            printf("|");
            int j;
            for (j = 0; j < cols; j++)
            {
                printf("%d ", matrix[i][j]);
            }
            printf("|\n");
        }
    }
    
    
    int main(int argc, char *argv[])
    {
        int my_rank;
        int num_procs;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); //grab this process's rank
        MPI_Comm_size(MPI_COMM_WORLD, &num_procs); //grab the total num of processes
    
        int results[num_procs]; // used to store the partial sum computed
        int rows, cols, colsB, k;
        rows = N;
        cols = N;
        colsB = COLUMNS_B;
        int **A; // N x N Matrix
        int B[N]; // N x 1 Matrix
        int C[N]; // N x 1 Matrix
    
        // Allocate memory for the NxN matrix on all processes
        A = (int**) malloc(N * sizeof(int*));
        for(k=0;k<N;k++)
            A[k]= (int*) malloc(N * sizeof(int));
    
        double start_time; // use these for timing
        double stop_time;
    
        if (my_rank == MASTER)
        {
            printf("Number of processes: %d\n", num_procs);
            printf("N: %d\n", N);
            srand(time(NULL));
    
            // Initilize arrays on root only
            init_matrix(A, rows, cols);
            printf("Matrix A:\n");
            print_matrix(A, rows, cols);
    
            init_vec(B, N);
            print_vec("Matrix B:\n", B, N);
    
            start_time = MPI_Wtime();
        }
    
        // Be consistent with names vs. values to avoid bugs
        MPI_Bcast(B, N, MPI_INT, MASTER, MPI_COMM_WORLD);
    
        for (k=0; k<N; k++)
            MPI_Bcast(&(A[k][0]), N, MPI_INT, MASTER, MPI_COMM_WORLD);
    
        int row = my_rank;  
        int my_sum = 0;
    
        int i,num;
        if (my_rank < N)
        {
            for (i = 0; i < N; i++)
            {
                num = A[row][i] * B[i];
                my_sum = my_sum + num;
            }
    
            C[row] = my_sum;
            printf("HAI FROM PROCESS %d! I will calculate row %d. My calculation: %d\n", my_rank, row, my_sum);
       }
    
        MPI_Gather(&C[row], 1, MPI_INT, &C[row], 1, MPI_INT, MASTER, MPI_COMM_WORLD);
    
        if (my_rank == MASTER)
        {
            stop_time = MPI_Wtime();
            print_vec("Matrix C:\n", C, N);
            printf("Total time (sec): %f\n", stop_time - start_time);
        }
    
        // Free matrix A
        for(k=0;k<N;k++)
            free(A[k]);
        free(A);
    
        MPI_Finalize();
    
        return EXIT_SUCCESS;
    }
    

    Like said in the comments, in this case you need to allocate memory for all your matrices on all your processes. That process is not the same as initialization of A and B which the program does only on the root process. Here A is allocated using malloc, while C is allocated statically and further on used as a vector in the same way as B. This is not necessary, but seems like a better choice since C is a 1D array and is in it's essence equivalent to B.

    B is broadcasted to all process as before, but instead of 0 the program uses MASTER so when you by any chance change the value of MASTER all occurrences of it also change. This is generally a good programming practice and it's applied everywhere in the new code.

    A is broadcasted in a simple but surely less efficient way than @Gilles Gouailardet suggested - the program simply broadcasts every row of A separately,

    for (k=0; k<N; k++)
        MPI_Bcast(&(A[k][0]), N, MPI_INT, MASTER, MPI_COMM_WORLD);
    

    This is related to row-major ordering and the fact that these N elements of kth row in A are accessed contiguously. This would fail if A was sent by columns.

    Remaining changes are assigning the value of my_sum rather than the pointer to it to C[row], C[row] = my_sum; and the gathering operation:

    MPI_Gather(&C[row], 1, MPI_INT, &C[row], 
           1, MPI_INT, MASTER, MPI_COMM_WORLD);
    

    Here each process sends its value C[row] to the C[row] on the root process. C is printed on the root using the print_vec function.