Search code examples
cmultithreadingparallel-processingpthreadsmatrix-multiplication

Using pThread is worse than Sequential


Using pthread library, I've had written some code for matrix multiplication.

AFAIK, the program using thread would be much faster than program without using thread. But, unlike my expectation... the outcome(Elapsed time) is totally reversed, the program not using thread is much faster than thread program. What happened.. I couldn't find why this is happened.

The Execution time below..

Time ( # of thread: 4 )

0.000451 sec

Time ( no thread )

0.000002 sec

I've coded this two version in one file(.c)

but always pThread is worse than sequential

.

  1. Serial version ( no using thread )

    void serial_multi()
    {
        for (int i = 0; i < MAX; i++)
            for(int j = 0; j < MAX; j++)
                for(int k = 0; k < MAX; k++)
                    _matC[i][j] += matA[i][k] * matB[k][j];
    }
    
  2. Using Thread ( # of thread : 4 )

     int step_i = 0;
     void* multi(void* arg)
     {
        int core = step_i++;
        // each thread computes 1/4th of matrix multiplication
        for (int i = core * MAX / 4; i < (core + 1) * MAX/4; i++)
            for(int j = 0; j < MAX; j++)
                for(int k = 0; k < MAX; k++)
                    matC[i][j] += matA[i][k] * matB[k][j];
        return NULL; 
     }
    
  3. main function

     int main()
    {
     printf("PID: %d\n",getpid());
     // generating random values in matA and matB
     for (int i = 0; i < MAX; i++){
         for (int j = 0; j < MAX; j++){
             matA[i][j] = rand() % 10;
             matB[i][j] = rand() % 10;
         }
     }
    
     //cout << endl << "Matrix A" << endl;
     for (int i = 0; i < MAX; i++){
         for (int j = 0; j < MAX; j++)
             printf("%d ",matA[i][j]);
         printf("\n");
     }
    
    
     //cout << endl << "Matrix B" << endl;
     for (int i = 0; i < MAX; i++){
         for (int j = 0; j < MAX; j++)
             printf("%d ",matB[i][j]);
         printf("\n");
     }
    
     // declaring 4 threads
     pthread_t threads[MAX_THREAD];
    
     // creating 4 threads, each evaluating its own part
         // Time Estimation
     clock_t start = clock();
     for (int i = 0; i < MAX_THREAD; i++){
    
         pthread_create(&threads[i], NULL, multi, NULL);
     }
    
     // joining and waiting for all threads to complete
     for (int i = 0; i < MAX_THREAD; i++)
         pthread_join(threads[i], NULL);
    
     clock_t end = clock();
     printf("Time: %lf\n", (double)(end-start)/CLOCKS_PER_SEC);
     // displaying the result matrix
     //cout << endl << "Multiplication of A and B" << endl;
    
     for (int i = 0; i < MAX; i++){
         for (int j = 0; j < MAX; j++)
             printf("%d ",matC[i][j]);
         printf("\n");
     }
    
    
    
     return 0;
    }
    

Solution

  • the threaded program runs slower because of two items.

    1. the threading takes extra time for context switches

    2. the creation/destruction of a thread takes time

    Those two items are the main 'extra' time takers.

    Also, you should note that this program is CPU bound, not I/O bound. A I/O bound program would benefit from threading, but; a CPU bound program is (significantly) delayed due to all the context switching, the creation of the threads, and the destruction of the threads.

    Note: the program could easily be made quicker via the use of a 'thread pool'.