I'm trying to calculate matrix multiplication using MPI Scatter()
and Gather()
functions and I want to be able to choose the matrix size without having to change the amount of processes used.
I've gone through the posts of MPI Matrix Multiplication with scatter gather and matrix multiplication using Mpi_Scatter and Mpi_Gather but they both use methods that don't work when a larger matrix size is defined, but only when the matrix size is the same as the processes/node size.
My code with an example matrix size of 8:
#define MAT_SIZE 8
void initialiseMatricies(float a[][MAT_SIZE], float b[][MAT_SIZE], float c[][MAT_SIZE])
{
int num = 11;
for (int i = 0; i < MAT_SIZE; i++)
{
for (int j = 0; j < MAT_SIZE; j++)
{
a[i][j] = num;
b[i][j] = num+1;
c[i][j] = 0;
}
num++;
}
}
int main(int argc, char **argv)
{
// MPI Variables
int rank, size;
// Create the main matrices with the predefined size
float matrixA[MAT_SIZE][MAT_SIZE];
float matrixB[MAT_SIZE][MAT_SIZE];
float matrixC[MAT_SIZE][MAT_SIZE];
// Create the separate arrays for storing the scattered rows from the main matrices
float matrixARows[MAT_SIZE];
float matrixCRows[MAT_SIZE];
// Initialise the matrices
initialiseMatricies(matrixA, matrixB, matrixC);
// Start the MPI parallel sequence
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int count = MAT_SIZE * MAT_SIZE / (size * (MAT_SIZE / size));
// Scatter rows of first matrix to different processes
MPI_Scatter(matrixA, count, MPI_INT, matrixARows, count, MPI_INT, 0, MPI_COMM_WORLD);
// Broadcast second matrix to all processes
MPI_Bcast(matrixB, MAT_SIZE * MAT_SIZE, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
// Matrix Multiplication
int sum = 0;
for (int i = 0; i < MAT_SIZE; i++)
{
for (int j = 0; j < MAT_SIZE; j++)
{
sum += matARows[j] * matB[j][i];
}
matCRows[i] = sum;
}
// Gather the row sums from the buffer and put it in matrix C
MPI_Gather(matrixCRows, count, MPI_INT, matrixC, count, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
// if it's on the master node
if (rank == 0)
printResults(matrixA, matrixB, matrixC, calcTime);
return 0;
}
Output:
1364 2728 4092 5456 6820 8184 9548 10912
1488 2976 4464 5952 7440 8928 10416 11904
1612 3224 4836 6448 8060 9672 11284 12896
1736 3472 5208 6944 8680 10416 12152 13888
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
The output is correct and if I set the number of processes to 8 (same as matrix size) then the whole matrix is correctly calculated but I don't want to have to do that. I believe my issue stems from the count inside Scatter()
and Gather()
. If I set the count to:
int count = MAT_SIZE * MAT_SIZE / size;
Then the output becomes:
1364 2728 4092 5456 6820 8184 9548 10912
-1.07374e+08 -1.07374e+08 11 11 11 11 11 11
1612 3224 4836 6448 8060 9672 11284 12896
-1.07374e+08 -1.07374e+08 13 13 13 13 13 13
1860 3720 5580 7440 9300 11160 13020 14880
-1.07374e+08 -1.07374e+08 15 15 15 15 15 15
2108 4216 6324 8432 10540 12648 14756 16864
-1.07374e+08 -1.07374e+08 17 17 17 17 17 17
Because the count essentially goes from 8 (previous) to 16, and gives me a Debug Error for each process saying
"Run-Time Check Failure #2 - Stack around the variable 'matrixC' was corrupted"
I've been changing this count formula around for a couple days now and still can't figure it out. I've tried changing my matrix multiplication start and end iterations but can't figure it out through that either.
The allow for setting a larger matrix size, the separate arrays should be 2D arrays with the 1st dimension set as the size of the segment based on the number of tasks/processes:
float matrixARows[MAT_SIZE/size][MAT_SIZE];
float matrixCRows[MAT_SIZE/size][MAT_SIZE];
Count should be:
int count = MAT_SIZE * MAT_SIZE / size;
And the matrix multiplication changed to:
int sum = 0;
for (int k = 0; k < MAT_SIZE/size; k++)
{
for (int i = 0; i < MAT_SIZE; i++)
{
for (int j = 0; j < MAT_SIZE; j++)
{
sum += matARows[k][j] * matB[j][i];
}
matCRows[k][i] = sum;
sum = 0;
}
}
Note: The matrix size must be divisible by the number of tasks/processes. E.g. if using 4 tasks, matrix size must be 4, 8, 16, 32, 64, 128 etc...