So, I am facing some difficulties using openMp. I am beginner and I do not know what I am doing wrong. This is a project for one of my courses at University, so I do not seek for the solution, rather for a hint or an explanation.
The project is to calculate the hamming distance between 2 strings that belong in different sets (lets say setA and setB). Those two sets may contain 100,1000 or 10000 strings which each one of them is composed by the same length of chars.
My problem is that despite the fact that I have decreased the execution time of the parallel program it still takes more time than the serial algorithm.
So, I attach my codes for showing what I have done so far.
serial C code.
void main(int argc,char **argv)
{
//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,l=0,TotalHammingDistance=0,count;
//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
setA[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
setB[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
for(j=0;j<I;j++){
setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setA[i][I]='\0';
}
for (i=0;i<n;i++){
for(j=0;j<I;j++){
setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setB[i][I]='\0';
}
//creation of m*n matrix to store all hamming distances and initialize it
int **HamDist = malloc(m * sizeof(int *)); // Allocate row pointers
for(i = 0; i < m; i++)
HamDist[i] = malloc(n * sizeof(int));
for(i=0;i<m;i++){
for(j=0;j<n;j++){
HamDist[i][j]=0;
}
}
clock_t start=clock();
//Calculate hamming distance for all combinations of the strings
for (i=0;i<m;i++){
for(j=0;j<n;j++){
count=0;
for(l=0;l<=I;l++) {
if (setA[i][l] != setB[j][l])
count++;
}
HamDist[i][j]=count;
TotalHammingDistance+=HamDist[i][j];
}
}
clock_t end =clock();
double hamm_time=(double)(end-start)/CLOCKS_PER_SEC;
printf("\n|Total Hamming execution time= %f",hamm_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}
OpenMp C code
void main(int argc,char **argv)
{
//initialize sets' number and string's length
int m=atoi(argv[1]),n=atoi(argv[2]),I=atoi(argv[3]);
int i=0,j=0,TotalHammingDistance=0, tid,nthreads,chunk;
//creation of 2-dimentional matrices for setA and setB
char **setA = malloc(m * sizeof(char *)); // Allocate row pointers
for(i = 0; i < m; i++)
setA[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
char **setB = malloc(n * sizeof(char *)); // Allocate row pointers
for(i = 0; i < n; i++)
setB[i] = malloc((I+1) * sizeof(char)); // Allocate each row separatel
// initialize matrices with random string (0 and 1)
for (i=0;i<m;i++){
for(j=0;j<I;j++){
setA[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setA[i][I]='\0';
}
for (i=0;i<n;i++){
for(j=0;j<I;j++){
setB[i][j]="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"[rand() % 62];
}
setB[i][I]='\0';
}
//creation of m*n matrix to store all hamming distances and initialize it
uint16_t **HamDist = malloc(m * sizeof(uint16_t *)); // Allocate row pointers
for(i = 0; i < m; i++)
HamDist[i] = malloc(n * sizeof(uint16_t));
for(i=0;i<m;i++){
for(j=0;j<n;j++){
HamDist[i][j]=0;
}
}
printf("\n HamDist set \n" );
int count=0;
clock_t start=clock();
omp_set_num_threads(2);
#pragma omp parallel shared(setA, setB,HamDist )
{
int k,p,l,count=0;
#pragma omp for schedule(dynamic, 10000)
for (k=0;k<m;k++){
for(p=0;p<n;p++){
count=0;
for(l=0;l<=I;l++){
if (setA[k][l] != setB[p][l]){
count++;
}
}
HamDist[k][p]=count;
}
}
}
clock_t end =clock();
double per_time=(double)(end-start)/CLOCKS_PER_SEC;
printf("\n|Total time for two sets= %f",per_time);
/**/
for (i=0;i<m;i++){
for(j=0;j<n;j++){
TotalHammingDistance+=HamDist[i][j];
}
}
printf("\n|Total execution time= %f",per_time);
printf("\n\n*|The Total Hamming Distance is: %d\n",TotalHammingDistance );
}
The execution time that I receive is around 42.011104 for the openmp program and about 32.876482 for the serial algorithm (m=n=10000 and I= 100, where m,n describes the number of strings at each set and I is the string length)
I strongly believe that the parallel program should takes less execution time. Any idea??
Thanks in advance!
Measuring multiprocessor performance is a bit more complicated but we can do a good approximation of "Does it work or not?" with time(1)
. If I do it with your code as it is (with GCC gcc-4.8.real (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5 invoked with gcc -W -Wall -Wextra -O3 -fopenmp openmptest.c -o openmptest
) I got
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.620011
|Total execution time= 9.620011
*|The Total Hamming Distance is: 1248788142
real 0m9.815s
user 0m9.700s
sys 0m0.116s
Where both, real and user are roughly the same value and also roughly the same as the normal version. If I remove schedule(dynamic, 10000)
completely and let Openmp decide for itself I get
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 9.187761
|Total execution time= 9.187761
*|The Total Hamming Distance is: 1248788142
real 0m4.819s
user 0m9.265s
sys 0m0.112s
That is 5/9 instead of 9/9. If I set omp_set_num_threads(2)
to 4 instead (I have four CPUs here.) I get
$ time ./openmptest 10000 10000 100
HamDist set
|Total time for two sets= 11.438243
|Total execution time= 11.438243
*|The Total Hamming Distance is: 1248788142
real 0m3.080s
user 0m11.540s
sys 0m0.104s
That is 3/11 < 5/9 < 9/9. So it works as expected if you let OpenMP do it itself. Removing omp_set_num_threads()
gave no difference to the last try.
You have a very simple program where OpenMP's defaults work quite well. Fine-tuning OpenMP is a science in and of itself but for example @Davislor 's comment about using reduction
seems to be a good one to start with.
BTW: You also have a lot of warnings, one of them is about shadowing count
which you declared two times, one before the loop and one inside. You should get rid of all the warnings. It happens more often than not that a very significant information is hidden in between those dozens of warnings.