Using pthread library, I've had written some code for matrix multiplication.
AFAIK, the program using thread would be much faster than program without using thread. But, unlike my expectation... the outcome(Elapsed time) is totally reversed, the program not using thread is much faster than thread program. What happened.. I couldn't find why this is happened.
The Execution time below..
Time ( # of thread: 4 )
0.000451 sec
Time ( no thread )
0.000002 sec
I've coded this two version in one file(.c)
but always pThread is worse than sequential
.
Serial version ( no using thread )
void serial_multi()
{
for (int i = 0; i < MAX; i++)
for(int j = 0; j < MAX; j++)
for(int k = 0; k < MAX; k++)
_matC[i][j] += matA[i][k] * matB[k][j];
}
Using Thread ( # of thread : 4 )
int step_i = 0;
void* multi(void* arg)
{
int core = step_i++;
// each thread computes 1/4th of matrix multiplication
for (int i = core * MAX / 4; i < (core + 1) * MAX/4; i++)
for(int j = 0; j < MAX; j++)
for(int k = 0; k < MAX; k++)
matC[i][j] += matA[i][k] * matB[k][j];
return NULL;
}
main function
int main()
{
printf("PID: %d\n",getpid());
// generating random values in matA and matB
for (int i = 0; i < MAX; i++){
for (int j = 0; j < MAX; j++){
matA[i][j] = rand() % 10;
matB[i][j] = rand() % 10;
}
}
//cout << endl << "Matrix A" << endl;
for (int i = 0; i < MAX; i++){
for (int j = 0; j < MAX; j++)
printf("%d ",matA[i][j]);
printf("\n");
}
//cout << endl << "Matrix B" << endl;
for (int i = 0; i < MAX; i++){
for (int j = 0; j < MAX; j++)
printf("%d ",matB[i][j]);
printf("\n");
}
// declaring 4 threads
pthread_t threads[MAX_THREAD];
// creating 4 threads, each evaluating its own part
// Time Estimation
clock_t start = clock();
for (int i = 0; i < MAX_THREAD; i++){
pthread_create(&threads[i], NULL, multi, NULL);
}
// joining and waiting for all threads to complete
for (int i = 0; i < MAX_THREAD; i++)
pthread_join(threads[i], NULL);
clock_t end = clock();
printf("Time: %lf\n", (double)(end-start)/CLOCKS_PER_SEC);
// displaying the result matrix
//cout << endl << "Multiplication of A and B" << endl;
for (int i = 0; i < MAX; i++){
for (int j = 0; j < MAX; j++)
printf("%d ",matC[i][j]);
printf("\n");
}
return 0;
}
the threaded program runs slower because of two items.
the threading takes extra time for context switches
the creation/destruction of a thread takes time
Those two items are the main 'extra' time takers.
Also, you should note that this program is CPU bound, not I/O bound. A I/O bound program would benefit from threading, but; a CPU bound program is (significantly) delayed due to all the context switching, the creation of the threads, and the destruction of the threads.
Note: the program could easily be made quicker via the use of a 'thread pool'.