Assigning MPI Processes to Solve Components of a Matrix

Question:

I have a C code function that solves the determinate of any n x n matrix using Cramer's rule. I drafted the code, and computed several solutions correctly without error. I would like to augment this code to utilize MPI and dedicate multiple processes to solve segments of the n x n matrix.

For example, given a 5 x 5 matrix with size = 5 processes, I would like dedicate process 0 through 4 to compute the first 5 4 x 4 matrices (using Cramer's decomposition method). Implementation of this idea has been fairly tricky, which is why I wanted to post a question here (trying to teach myself mpi).

Below is the code block that computes the determinate:


int determinate_solver(int r, int* ptr, int rank, int size) {

    int ans = 0, a = 0, b = 0, c = 0, d = 0;
    int inner_sol = 0, inner_det = 0;
    
    // define a method to modify the range of computation with respect to available 
    // processes
    int upperLimit = r;
    int start_val = rank * ceil(upperLimit / size) + 1, end_val;

    /* how should start_val and end_val be implemented correctly? */

    if (rank == (size - 1)) {
        end_val = upperLimit;
    }
    else {
        end_val = start_val + ceil(upperLimit / size) - 1;
    }

        // include conditions for when a 1x1 or
        // 2x2 matrix is included.
        if (r == 1 || r == 2) {
            if (r == 1) {
                ans = ptr[0];
            }
            else {
                a = ptr[0];
                b = ptr[1];
                c = ptr[2];
                d = ptr[3];
                ans = (a * d) - (b * c);
            }
        }
        else {
            int i, j, k, l, n = 0, sign = +1, basic, element;

            // define a new pointer array to take into account
            // that a sub-matrix will be created from the larger
            // matrix. This is apart of the recursive sol'n.
            int* q = (int*)calloc(((r - 1) * (r - 1)), sizeof(int));

                      // 0   // r
            for (int i = 0; i < r; i++) {
                l = 0;
                n = 0;
                // note that the value of i is here.
                // basic is the scaler that will be used in the intermediate solution
                basic = *(ptr + i);

                for (int j = 0; j < r; j++) {

                    for (k = 0; k < r; k++) {

                        element = *(ptr + l);
                        if ((j == 0) || (i == k));
                        else
                        {
                            *(q + n) = element;
                            n = n + 1;
                        }
                        l = l + 1;
                    }
                }
                inner_det = determinate_solver (r - 1, q, rank, size); /* define rank and size  */
                inner_sol = sign * basic * inner_det;
                ans = ans + inner_sol;
                sign = sign * -1;
                printf("The final solution is: %d from rank %d \n \n", ans, rank);
            }
        }
        return ans;
}

The overall program is < 150 lines of code. For the sake of brevity, I excluded unnecessary code in this question.

What I Have Tried

I included end_val and start_val parameters in this code block (see commented section). I intended to use these variables to build the effective ranges of operation for each process. I initially altered the first for loop as shown below,

for(int i = start_val; i < end_val; i++){

/* run some code */

}

but this proved to be unsuccessful (ie. incorrect values were returned from ans). start and end vals work well for defining array values, but struggle in 2 dimensional applications Moreover, I also tried to replace every instance of i with start_val with the hope that this would fix the problem--no luck. I believe that the start_val and end_val can be utilized effectively to solve this problem, but I am unsure how to properly implement this.

How To Help Best

Articles, papers, or SO links that lead to a solution will be accepted as an answer. Alternate solutions not using Cramer's rule may also be accepted.

Solution

Cramer's rule has factorial complexity, so computing the determinant is better done through computing an LU factorization and multiplying the pivots.
Your code recursively copies non-contiguous submatrices into contiguous storage fine. If you want to do that with MPI, you can either have the big matrix on one process, create the submatrices and send them all other processes. (Sending to yourself is a little tricky but not much.) That has a big time bottleneck.
Or you can put the big matrix on every process and let each process extract its submatrix. That is wasteful in memory.

In all, I would not consider this to be a good test problem for MPI, but it's certainly not impossible or even very hard. Just not efficient.