I'm about to implement the Sieve of Eratosthenes and have a general question regarding the sieve-array.
I've implemented the sieve quite a few times now (in C) and always used an array of uint8_t
(out of <stdint.h>
) as the sieve. This is pretty memory inefficient, since 8 bits are used for each number to sieve, even though one bit should be sufficient.
How would I go about this in C? I need an array of bits. I could pretty much create an array of any type (uint8_t
, uint16_t
, uint32_t
, uint64_t
) and access the single bits with bit masks, etc.
What data type should I prefer and what operations should I use to access the bits without performance loss?
PS: I don't think this is a duplicate of just a BitArray implementation, since it the question is specific about the Sieve of Eratosthenes, since it's main nature needs to be efficient (not only in memory usage, but in access). I was thinking, that maybe different tricks can be used to make the sieving process more efficient...
As mentioned by Weather Vane in his comments, you can save additional space by only considering every other number, since all even number except 2 are non-prime.
So in your bit array, each bit represents an odd number.
Here's an implementation I did a few years back using this technique.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>
#include <math.h>
#include <stdint.h>
uint8_t *num;
int count = 0;
FILE *primefile;
int main(int argc, char *argv[])
{
int i,j,root;
time_t t;
if (argc>1) count=atoi(argv[1]);
if (count < 100) {
fprintf(stderr,"Invalid number\n");
exit(1);
}
if ((num=calloc(count/16,1))==NULL) {
perror("calloc failed");
exit(1);
}
if ((primefile=fopen("primes.dat","w"))==NULL) {
perror("Coundn't open primes.dat");
exit(1);
}
t=time(NULL);
printf("Start:\t%s",ctime(&t));
root=floor(sqrt(count));
// write 2 to the output file
i=2;
if (fwrite(&i,sizeof(i),1,primefile)==0) {
perror("Couldn't write to primes.dat");
}
// process larger numbers
for (i=3;i<count;i+=2) {
if ((num[i>>4] & (1<<((i>>1)&7)))!=0) continue;
if (fwrite(&i,sizeof(i),1,primefile)==0) {
perror("Couldn't write to primes.dat");
}
if (i<root) {
for (j=3*i;j<count;j+=2*i) {
num[j>>4]|=(1<<((j>>1)&7));
}
}
}
t=time(NULL);
printf("End:\t%s",ctime(&t));
fclose(primefile);
return 0;
}
Here, num
is the bit array, allocated dynamically based on the upper bound of your search. So if you were looking for all primes up to 1000000000 (1 billion), it uses 64000000 (64 million) bytes of memory.
The key expressions are as follows:
For a "normal" bit array:
Set bit i
:
num[i>>3] |= (1<<(i&7);
// same as num[i/8] |= (1<<((i%8));
Check bit i
:
(num[i>>3] & (1<<(i&7))) != 0
// same as (num[i/8] & (1<<(i%8))) != 0
Since we're only keeping track of every other number, we divide i
by 2 (or equivalently, shift right by 1:
num[i>>4] |= (1<<((i>>1)&7);
// same as num[(i/2)/8] |= (1<<(((i/2)%8));
(num[i>>4] & (1<<((i>>1)&7))) != 0
// same as (num[(i/2)/8] & (1<<((i/2)%8))) != 0
In the above code, there are some micro-optimizations where division and modulus by powers of 2 are replaced with bit shifts and bitwise-AND masks, but most compilers should do that for you.