Warning: This code is a solution for Project Euler Problem 50. If you don't want it spoiled, don't look here.
Here I have code that searches for a long sequence of consecutive prime numbers, which summed together are also a prime. At one point, I need to test whether a sum is prime.
I have two tests, which are ifdef'd in the function computeMaxPrime. The first checks the sum against a std::set of prime numbers. The second uses a Miller-Rabin test implemented by GMP. The function only gets called 6 times. When I use the first test, the function computeMaxPrime takes .12 seconds. When I use the second test, it only takes ~.00002 seconds. Can someone explain how that's possible? I wouldn't think 6 calls to check whether a number is in a set would take 100 ms. I also tried using an unordered_set, and it performs the same.
I thought that maybe it was a timing issue, but I've verified it via timing the whole program execution from Terminal (on OSX). I've also verified that if I change the test to use the Miller-Rabin test first, and then confirm using the set, it makes a single call to the set and the clock reports .02 seconds, exactly what I would expect (1/6th the total time of only using the set test).
#include "PrimeGenerator2.h"
#include <set>
#include <stdio.h>
#include <time.h>
#include <gmp.h>
typedef std::set<u_int64t> intSet;
bool isInIntSet (intSet set,
u_int64t key)
{
return (set.count(key) > 0);
}
bool isPrime (u_int64t key)
{
mpz_t integ;
mpz_init (integ);
mpz_set_ui (integ, key);
return (mpz_probab_prime_p (integ, 25) > 0);
}
void computeInitialData (const u_int64t limit,
intSet *primeSet,
intList *sumList,
u_int64t *maxCountUpperBound)
{
PrimeSieve sieve;
u_int64t cumSum = 0;
u_int64t pastUpperBound = 0;
*maxCountUpperBound = 0;
for (u_int64t prime = sieve.NextPrime(); prime < limit; prime = sieve.NextPrime()) {
primeSet->insert(prime);
cumSum += prime;
sumList->push_back(cumSum);
if (cumSum < limit)
(*maxCountUpperBound)++;
else
pastUpperBound++;
}
}
u_int64t computeMaxPrime (const u_int64t limit,
const intSet &primeSet,
const intList &sumList,
const u_int64t maxCountUpperBound)
{
for (int maxCount = maxCountUpperBound; ; maxCount--) {
for (int i = 0; i + maxCount < sumList.size(); i++) {
u_int64t sum;
sum = sumList[maxCount + i] - sumList[i];
if (sum > limit)
break;
#if 0
if (isInIntSet (primeSet, sum))
return sum;
#else
if (isPrime (sum))
return sum;
#endif
}
}
return 0; // This should never happen
}
u_int64t findMaxCount (const u_int64t limit)
{
intSet primeSet; // Contains the set of all primes < limit
intList sumList; // Array of cumulative sums of primes
u_int64t maxCountUpperBound = 0; // Used an initial guess for the maximum count
u_int64t maxPrime; // Final return value
clock_t time0, time1, time2;
time0 = clock();
computeInitialData (limit, &primeSet, &sumList, &maxCountUpperBound);
time1 = clock();
maxPrime = computeMaxPrime (limit, primeSet, sumList, maxCountUpperBound);
time2 = clock();
printf ("%f seconds for primes \n" , (double)(time1 - time0)/CLOCKS_PER_SEC);
printf ("%f seconds for search \n" , (double)(time2 - time1)/CLOCKS_PER_SEC);
return maxPrime;
}
int main(void)
{
printf ("%lld\n", findMaxCount(1000000));
}
EDIT: Oh it's even weirder. Appears to have nothing to do with the STL set. If I do a hack to make isInIntSet just check how many times it's been called, it's equally slow compared to the GMP test. This makes me think I've likely just run across a compiler bug (EDIT2: Never blame the compiler!)
bool isInIntSet (intSet set, u_int64t key)
{
static int counter = 0;
counter++;
return (counter == 6);
}
Duh. The function isInIntSet is taking an intSet as an argument directly, so the entire set is being copied. I meant to pass by reference (intSet &set). That takes the search time down to .000003 seconds with an unordered_set.