c multithreading parallel-processing pthreads matrix-multiplication

Using pThread is worse than Sequential

Using pthread library, I've had written some code for matrix multiplication.

AFAIK, the program using thread would be much faster than program without using thread. But, unlike my expectation... the outcome(Elapsed time) is totally reversed, the program not using thread is much faster than thread program. What happened.. I couldn't find why this is happened.

The Execution time below..

Time ( # of thread: 4 )

0.000451 sec

Time ( no thread )

0.000002 sec

I've coded this two version in one file(.c)

but always pThread is worse than sequential

Serial version ( no using thread )

void serial_multi()
{
    for (int i = 0; i < MAX; i++)
        for(int j = 0; j < MAX; j++)
            for(int k = 0; k < MAX; k++)
                _matC[i][j] += matA[i][k] * matB[k][j];
}

Using Thread ( # of thread : 4 )

 int step_i = 0;
 void* multi(void* arg)
 {
    int core = step_i++;
    // each thread computes 1/4th of matrix multiplication
    for (int i = core * MAX / 4; i < (core + 1) * MAX/4; i++)
        for(int j = 0; j < MAX; j++)
            for(int k = 0; k < MAX; k++)
                matC[i][j] += matA[i][k] * matB[k][j];
    return NULL; 
 }

main function

 int main()
{
 printf("PID: %d\n",getpid());
 // generating random values in matA and matB
 for (int i = 0; i < MAX; i++){
     for (int j = 0; j < MAX; j++){
         matA[i][j] = rand() % 10;
         matB[i][j] = rand() % 10;
     }
 }

 //cout << endl << "Matrix A" << endl;
 for (int i = 0; i < MAX; i++){
     for (int j = 0; j < MAX; j++)
         printf("%d ",matA[i][j]);
     printf("\n");
 }


 //cout << endl << "Matrix B" << endl;
 for (int i = 0; i < MAX; i++){
     for (int j = 0; j < MAX; j++)
         printf("%d ",matB[i][j]);
     printf("\n");
 }

 // declaring 4 threads
 pthread_t threads[MAX_THREAD];

 // creating 4 threads, each evaluating its own part
     // Time Estimation
 clock_t start = clock();
 for (int i = 0; i < MAX_THREAD; i++){

     pthread_create(&threads[i], NULL, multi, NULL);
 }

 // joining and waiting for all threads to complete
 for (int i = 0; i < MAX_THREAD; i++)
     pthread_join(threads[i], NULL);

 clock_t end = clock();
 printf("Time: %lf\n", (double)(end-start)/CLOCKS_PER_SEC);
 // displaying the result matrix
 //cout << endl << "Multiplication of A and B" << endl;

 for (int i = 0; i < MAX; i++){
     for (int j = 0; j < MAX; j++)
         printf("%d ",matC[i][j]);
     printf("\n");
 }



 return 0;
}

Solution

the threaded program runs slower because of two items.

the threading takes extra time for context switches
the creation/destruction of a thread takes time

Those two items are the main 'extra' time takers.

Also, you should note that this program is CPU bound, not I/O bound. A I/O bound program would benefit from threading, but; a CPU bound program is (significantly) delayed due to all the context switching, the creation of the threads, and the destruction of the threads.

Note: the program could easily be made quicker via the use of a 'thread pool'.