I need to make the Leibniz algorithm with pthread in c, now I have this code, but at the moment the threads implementation takes the same time of the sequential implementation, I think it is not running concurrently. Can someone see the error.
Thanks!!
#include<stdio.h>
#include<math.h>
#include<pthread.h>
#include<stdlib.h>
#define NUM_THREADS 2
#define ITERATIONS 100000000
double result = 0.0;
void *leibniz(void *threadid){
int size = ITERATIONS/NUM_THREADS;
int start = (long)threadid * size;
int end = ((long)threadid+1) * size;
int i;
for(i = start; i<end; i++){
int denom = 2*i+1;
result += pow(-1.0, i) * (1.0/denom);
}
}
int main(){
pthread_t threads[NUM_THREADS];
long t;
int rc;
// CREATE
for(t=0;t<NUM_THREADS;t++){
rc = pthread_create(&threads[t], NULL, leibniz, (void *)t);
if(rc){
printf("ERROR: return code %d\n", rc);
}
}
// JOIN
for(t=0;t<NUM_THREADS;t++){
rc = pthread_join(threads[t], NULL);
if(rc){
printf("ERROR: return code %d\n", rc);
exit(-1);
}
}
printf("Pi %f\n", result*4);
exit(0);
}
Thanks to Jean-François Fabre I did these changes, now it works!
double result=0.0;
void *leibniz(void *threadid){
double local = 0.0;
int size = ITERATIONS/NUM_THREADS;
int start = (long)threadid * size;
int end = ((long)threadid+1) * size;
int i;
for(i = start; i<end; i++){
local += (i%2==0 ? 1 : -1) * (1.0/(2*i+1));
}
result += local*4;
}
I'll attempt to answer.
Even if your application is multithreaded, it's not guaranteed that there's 1 FPU per core. I know little about that but I think that some AMD processors actually share the FPU between cores.
Since your loop is basically adding and pow
, it's 99% FPU computation, so if the FPU is shared on your computer, it explains the bottleneck.
You could reduce FPU usage by not calling pow
just to compute -1
or 1
, which would be a scalar operation, and maybe would make a difference then. just use -1
if i
is odd, 1
otherwise, or negate an external 1/-1 variable at each iteration.
Also, in order to avoid race conditions, accumulate the result in a local result, and add it in the end (protecting the addition by a mutex in the end would be even better)
double result = 0.0;
void *leibniz(void *threadid){
double local = 0.0;
int size = ITERATIONS/NUM_THREADS;
int start = (long)threadid * size;
int end = ((long)threadid+1) * size;
int i;
for(i = start; i<end; i++){
int denom = 2*i+1;
// using a ternary/scalar speeds up the "pow" computation, multithread or not
local += (i%2 ? -1 : 1) * (1.0/denom);
}
// you may want to protect that addition with a pthread_mutex
// start of critical section
result += local;
// end of critical section
}
http://wccftech.com/amd-one-fpu-per-core-design-zen-processors/