c multithreading parallel-processing openmp

Proper way to compute time in openmp

I wrote a serial and parallel code for matrix multiplication and I compute the time in the serial code and got 4 seconds. However, in the parallel code, when I run it using for example 4 threads and compute the time and I get more than 20 and every time the number of threads is increased, the time is increased too. So I want to know what's wrong. here is the openmp code:

#include <stdio.h>
#include <omp.h>
#include <stdlib.h>
#include <time.h>

int main (int argc, char *argv[]) {

    int r1,c1;
    int r2,c2;
    int i,j,k;
    int **mat1;
    int **mat2;
    int **result;

    srand(time(0));

    double time_spent = 0;

    printf("Enter dimensions of the first matrix: \n");
    scanf("%d%d",&r1,&c1);

    mat1 = (int **)malloc(r1 * sizeof(int*));

    for(i=0;i<r1;i++)
        mat1[i] = (int *)malloc(c1 * sizeof(int));

    for(i=0;i<r1;i++)
        for(j=0;j<c1;j++)
            mat1[i][j] = (rand() % (10 - 1 + 1)) + 1;



    printf("Enter dimensions of the second matrix: \n");
    scanf("%d%d",&r2,&c2);

    mat2 = (int **)malloc(r2 * sizeof(int*));

    for(i=0;i<r2;i++)
        mat2[i] = (int *)malloc(c2 * sizeof(int));


    for(i=0;i<r2;i++)
        for(j=0;j<c2;j++)
            mat2[i][j] = (rand() % (10 - 1 + 1)) + 1;



    result = (int **)malloc(r1 * sizeof(int*));

    for(i=0;i<r1;i++)
        result[i] = (int *)malloc(c2 * sizeof(int));

    #pragma omp parallel private(i, j, k) shared(mat1, mat2, result)
    {

        clock_t begin = clock();
        #pragma omp for schedule(static)
        for(i = 0;i<r1;i++){

            for(j = 0;j<c2;j++){
                for(k=0;k<r2;k++){

                    result[i][j] +=  mat1[i][k] * mat2[k][j];
                }

            }

        }
        clock_t end = clock();
        time_spent += (double)(end - begin) / CLOCKS_PER_SEC;

    }
    printf("Time elapsed: %f\n", time_spent);
    printf("\n");
    /*for(i=0;i<r1;i++){
        for(j=0;j<c2;j++){
            printf("%d ",result[i][j]);
        }
        printf("\n");
    }*/
    for(i=0;i<r1;i++)
        free(mat1[i]);
    free(mat1);

    for(i=0;i<r2;i++)
        free(mat2[i]);
    free(mat2);

    for(i=0;i<r1;i++)
        free(result[i]);
    free(result);
}

Solution

Q : "Proper way to compute time in openmp"

OpenMP has nothing to do with this, this is related to the indirect accounting of how many CPU-ticks (stored inside CPU-core low-level hardware registers) have been accumulated from (all) threads, which actually can run in multiple instances at the same time. Compare that to the OpenMP native tool :
double omp_get_wtime(void);

You may like to experiment with more run-time options live here, to see all the improvement impacts from better, more cache-efficient RAM-I/O and other OpenMP scheduling, thread-capacites, sharing-avoidance and other options of choice.

///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Omp-out-most-BLOCK
//                   Time elapsed:  0.004587 [ 123 x  123] -03
//                   Time elapsed:  0.029240 [ 123 x  123]
//                   Time elapsed:  5.266409 [1234 x 1234] -O3 [ i, j, k ] ~ User time:  2.729 s   Real time:  1.481 s
//                   Time elapsed: 46.048513 [1234 x 1234]
//                   Time elapsed: 73.393222 [1234 x 1234] -O3 [ j, k, i ]
//                   Time elapsed: 86.988589 [1234 x 1234] -O3 [ k, j, i ] ~ User time: 43.613 s   Real time: 22.411 s

// w/o Omp           
// a pure-[SERIAL]   Time elapsed:  0.001580 [ 123 x  123] -03
//                   Time elapsed:  0.010290 [ 123 x  123]
//                   Time elapsed:  4.075591 [1234 x 1234] -O3 [ i, j, k ] ~ User time:  4.209 s   Real time:  4.296 s
//                   Time elapsed: 23.437123 [1234 x 1234]     [ i, j, k ] ~ User time: 23.520 s   Real time: 23.716 s
//                   Time elapsed: 42.685109 [1234 x 1234]     [ k, j, i ] ~ User time: 42.757 s   Real time: 43.187 s
//