As an introduction to OpenMP, I'm trying to print a list of prime numbers.
// compile: gcc primes.c -fopenmp -o primes
#include <stdio.h> //printf()
#include <stdlib.h> //atoi() and malloc()
//#define ARR_MAX 1000
int isPrime(int n) {
if (n < 2)
return 0;
for (int i = 2; i < n; i++) {
if (n % i == 0)
return 0;
}
return 1;
}
int main(int argc, char *argv[]) {
int n;
if (argc < 2 || !(n = atoi(argv[1]))) // || n >= ARR_MAX)
return 0;
// int arr[ARR_MAX];
int *arr = (int *)malloc(sizeof(int) * (n + 1));
int counter = 0;
#pragma omp parallel for shared(arr)
for (int i = 0; i <= n; i++) {
if (isPrime(i)) {
arr[counter] = i;
printf("%2d ", arr[counter]);
counter++;
}
}
printf("\n");
for (int i = 0; i < counter; i++) {
printf("%2d ", arr[i]);
}
printf("\n");
free(arr);
}
I print the results twice.
Here's the same example, run 6 times:
$ ./primes 55
2 3 5 7 11 13 17 19 23 43 47 53 29 31 37 41
43 3 5 7 11 13 1 19 23 0 47 53 33605084 31 37 41
$ ./primes 55
29 31 37 41 2 3 5 7 11 13 43 47 53 17 19 23
43 31 37 41 43 3 5 7 11 13 9 47 53 0 19 23
$ ./primes 55
2 3 5 7 11 13 29 31 37 41 17 19 23 43 47 53
43 3 5 7 11 13 1 31 37 41 9 19 23 0 47 53
$ ./primes 55
29 31 37 41 43 47 53 2 3 5 7 11 13 17 19 23
43 31 37 41 43 47 53 0 3 5 7 11 13 0 19 23
$ ./primes 55
43 47 53 17 19 23 2 3 5 7 11 13 29 31 37 41
43 47 53 0 19 23 1 3 5 7 11 13 33605084 31 37 41
$ ./primes 55
29 31 37 41 2 3 5 7 11 13 43 47 53 17 19 23
43 31 37 41 43 3 5 7 11 13 9 47 53 0 19 23
I'm currently using malloced array, just to avoid having a max. But I also tried it with a static array. The code is still there, commented out, if you want to inspect it or try it. The same problem was present then.
There is a race condition in your code: you cannot safely do counter++
and arr[counter] = i
in this parallel loop because multiple thread can find primes at the same time.
A simple way to fix this is to use a critical section (with #pragma omp critical
so the content of the condition is executed by one thread at a time and counter
is protected. This is not so bad when n
is large since there are not a lot of prime number compared to classical numbers (AFAIK O(n/log(n))
prime numbers). This solution is certainly the best on platforms with few cores.
An alternative solution is to use atomics (wit #pragma omp atomic
), but they are not far better than a lock here on most mainstream architecture.
Another solution is to write the prime number found in a thread-local buffer and then copy it in a shared buffer. That being said, the order is not preserved. You can restore the order using a sort, but sorting algorithms tends to be a bit slow. A parallel sort can be relatively fast, but hard to implement efficiently.
A last solution is to allocate a 8-bit mask array of size n
initialized to 0 and flag prime numbers by setting the value to 1. This solution is certainly the best on many-core platforms. You can then travel the mask array to find the prime numbers. This can even be done in parallel in two steps: 1 parallel reduction to count the number of values and one to fill the final array containing prime numbers.
Also please note that the order of printf
calls is not deterministic in a parallel loop.