I have a BSP implementation in C for the Sieve Of Erastothenes, see the code below.
When executed with ./bspsieve 2 100 it however gives the following output:
"It took : 0.000045 seconds for proc 0 out of 2. 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,"
for ./bspsieve 1 100 it gives the same, i.e:
"./bspsieve 1 100
It took : 0.000022 seconds for proc 0 out of 1.
23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,"
For ./bspsieve 8 100 (so using 8 processors) it gives a segmentation fault. i.e "./bspsieve 8 100 It took : 0.000146 seconds for proc 0 out of 8. Segmentation fault (core dumped)" This means that my bounds aren't okay I think?
It fails to find the first primes! I can't find my fault (really inexperienced with C). Except this, are there other improvements to my code you guys can suggest? The algorithm doesn't need to be fast, but any improvement in understandability and readability is welcome.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mcbsp.h>
/*
Note: To compile, this file has to be in the same folder as mcbsp.h and you need the 2 following commands:
gcc -Iinclude/ -pthread -c -o bspsieve.o bspsieve.c
gcc -o bspsieve bspsieve.o lib/libmcbsp1.1.0.a -lpthread -lrt
*/
int procs;
int upperbound;
int *primes;
//SPMD function
void bspSieve(){
bsp_begin(procs);
int p = bsp_nprocs(); // p = number of procs obtained
int s = bsp_pid(); // s = proc number
float blocksize; // block size to be used, note last proc has a different size!
if( s != p-1){
blocksize = ceil(upperbound/p);
} else {
blocksize = upperbound - (p-1)*ceil(upperbound/p);
}
// Initialize start time and end time, set start time to now.
double start_time,end_time;
start_time = bsp_time();
// Create vector that has block of candidates
int *blockvector;
blockvector = (int *)malloc(blocksize*sizeof(int));
int q;
for(q = 0; q<blocksize; q++){
//List contains the integers from s*blocksize till blocksize + s*blocksize
blockvector[q] = q + s*blocksize;
}
//We neglect the first 2 'primes' in processor 0.
if(s == 0){
blockvector[0] = 0;
blockvector[1] = 0;
}
// We are using the block distribution. We assume that n is large enough to
// assure that n/p is larger than sqrt(n). This means that we will always find the
// sieving prime in the first block, and so have to broadcast from the first
// processor to the others.
long sieving_prime;
int i;
bsp_push_reg( &sieving_prime,sizeof(long) );
bsp_sync();
for(i = 2; i * i < upperbound; i++) {
//Part 1: if first processor, get the newest sieving prime, broadcast. Search for newest prime starting from i.
if(s == 0){
int findPrimeNb;
for(findPrimeNb = i; findPrimeNb < blocksize; findPrimeNb++) {
if( blockvector[findPrimeNb] != 0) {
sieving_prime = blockvector[findPrimeNb];
//broadcast
int procNb;
for(procNb = 0; procNb < p; ++procNb){
bsp_put(procNb, &sieving_prime,&sieving_prime,0,sizeof(long));
}
break;
}
}
}
bsp_sync();
//Part 2: Sieve using the sieving prime
int sievingNb;
for(sievingNb = 0; sievingNb < blocksize; sievingNb++){
//check if element is multiple of sieving prime, if so, pcross out (put to zero)
if( blockvector[sievingNb] % sieving_prime == 0){
blockvector[sievingNb] = 0;
}
}
}
//part 3: get local primes to central area
int transferNb;
long transferPrime;
for(transferNb = 0; transferNb < blocksize; transferNb++){
transferPrime = blockvector[transferNb];
primes[transferPrime] = transferPrime;
}
// take the end time.
end_time = bsp_time();
//Print amount of taken time, only processor 0 has to do this.
if( s == 0 ){
printf("It took : %.6lf seconds for proc %d out of %d. \n", end_time-start_time, bsp_pid(), bsp_nprocs());
fflush(stdout);
}
bsp_pop_reg(&sieving_prime);
bsp_end();
}
int main(int argc, char **argv){
if(argc != 3) {
printf( "Usage: %s <proc count> <upper bound> <n", argv[ 0 ] );
exit(1);
}
//retrieve parameters
procs = atoi( argv[ 1 ] );
upperbound = atoi( argv[ 2 ] );
primes = (int *)malloc(upperbound*sizeof(int));
// init and call parallel part
bsp_init(bspSieve, argc, argv);
bspSieve();
//Print all non zeros of candidates, these are the primes.
// Primes only go to p*p <= n
int i;
for(i = 0; i < upperbound; i++) {
if(primes[i] > 0) {
printf("%d, ",primes[i]);
fflush(stdout);
}
}
return 0;
}
Troubles may come from
blockvector[q] = q + s*blocksize;
As long as blocksize
is equal to ceil(upperbound/p)
on all processus, there is no problem. Since 1 and 2 are divisor of 100, your program works well.
As you wrote in your code, it is not always the case...It is not true on the last processus when calling ./bspsieve 8 100
. Some values in blockvector are above 100, and a segmentation fault is likely to appear when writting in the prime
array.
A way to correct this behaviour is :
blockvector[q] = q + s*ceil(upperbound/p);
(store ceil(...)
to run faster.)
It might also be better to zero the prime
array before using it.
I did not check wheather it works...Try it !
Bye,
Francis