My goal is to create a program that evaluates the performance gains from increasing the number of threads the program can use. I evaluate the performance by using the Monte Carlo method to calculate pi. Each thread should create 1 random coordinate (x,y) and check if that coordinate is within the circle or not. If it is, the inCircle
counter should increase. Pi is calculated as follows: 4 * inCircle/trys
. Using pthread_join
, there is no performance gains in a problem that should benefit from multiple threads. Is there some way to allow multiple threads to increase a counter without having to wait for each individual thread?
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <stdbool.h>
#include <pthread.h>
#define nPoints 10000000
#define NUM_THREADS 16
int inCircle = 0;
int count = 0;
double x,y;
pthread_mutex_t mutex;
bool isInCircle(double x, double y){
if(x*x+y*y<=1){
return true;
}
else{
return false;
}
}
void *piSlave(){
int myCount = 0;
time_t now;
time(&now);
srand((unsigned int)now);
for(int i = 1; i <= nPoints/NUM_THREADS; i++) {
x = (double)rand() / (double)RAND_MAX;
y = (double)rand() / (double)RAND_MAX;
if(isInCircle(x,y)){
myCount++;
}
}
pthread_mutex_lock(&mutex);
inCircle += myCount;
pthread_mutex_unlock(&mutex);
pthread_exit(0);
}
double piMaster()
{
pthread_t threads[NUM_THREADS];
int rc;
long t;
for(t=0; t<NUM_THREADS; t++){
printf("Creating thread %ld\n", t);
rc = pthread_create(&threads[t], NULL, piSlave, (void *)t);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
//pthread_join(threads[t], NULL);
}
//wait(NULL);
return 4.0*inCircle/nPoints;
}
int main()
{
printf("%f\n",piMaster());
return(0);
}
There are few issues with the code.
The piMaster()
function should wait for the threads it created. We can do this by simply running pthread_join()
in a loop:
for (t = 0; t < NUM_THREADS; t++)
pthread_join(threads[t], NULL);
We can simply atomically increase the inCircle
counter at the end of the loop, so no locks are needed. The variable must be declared with _Atomic
keyword as described in the Atomic operations C reference:
_Atomic long inCircle = 0;
void *piSlave(void *arg)
{
[...]
inCircle += myCount;
[...]
}
This will generate correct CPU instructions to atomically increase the variable. For example, for x86
architecture a lock
prefix appears as we can confirm in disassembly:
29 inCircle += myCount;
0x0000000100000bdb <+155>: lock add %rbx,0x46d(%rip) # 0x100001050 <inCircle>
rand()
Instead, we can simply scan the whole circle in a loop as described on Approximations of Pi Wikipedia page:
for (long x = -RADIUS; x <= RADIUS; x++)
for (long y = -RADIUS; y <= RADIUS; y++)
myCount += isInCircle(x, y);
So here is the code after the changes above:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define RADIUS 10000L
#define NUM_THREADS 10
_Atomic long inCircle = 0;
inline long isInCircle(long x, long y)
{
return x * x + y * y <= RADIUS * RADIUS ? 1 : 0;
}
void *piSlave(void *arg)
{
long myCount = 0;
long tid = (long)arg;
for (long x = -RADIUS + tid; x <= RADIUS + tid; x += NUM_THREADS)
for (long y = -RADIUS; y <= RADIUS; y++)
myCount += isInCircle(x, y);
printf("\tthread %ld count: %zd\n", tid, myCount);
inCircle += myCount;
pthread_exit(0);
}
double piMaster()
{
pthread_t threads[NUM_THREADS];
long t;
for (t = 0; t < NUM_THREADS; t++) {
printf("Creating thread %ld...\n", t);
if (pthread_create(&threads[t], NULL, piSlave, (void *)t)) {
perror("Error creating pthread");
exit(-1);
}
}
for (t = 0; t < NUM_THREADS; t++)
pthread_join(threads[t], NULL);
return (double)inCircle / (RADIUS * RADIUS);
}
int main()
{
printf("Result: %f\n", piMaster());
return (0);
}
And here is the output:
Creating thread 0...
Creating thread 1...
Creating thread 2...
Creating thread 3...
Creating thread 4...
Creating thread 5...
Creating thread 6...
Creating thread 7...
Creating thread 8...
Creating thread 9...
thread 7 count: 31415974
thread 5 count: 31416052
thread 1 count: 31415808
thread 3 count: 31415974
thread 0 count: 31415549
thread 4 count: 31416048
thread 2 count: 31415896
thread 9 count: 31415808
thread 8 count: 31415896
thread 6 count: 31416048
Result: 3.141591