I wanted to write a program which will print and count all the primes between 1 and k. The idea is that we take each odd number i between 1 and k and check for odd factors of i up to sqrt(i). It took my computer 1 minute 47 seconds to go check for primes up to k = 1 000 000.
Then I tried to improve my code: instead of checking all odd numbers from 1 to k, I removed all the multiples of 3, 5, and 7 from these odd numbers, so basically I removed about half the numbers we have to check. I ran the program again, and with k = 1 000 000 I got a time of 1 minute and 40 seconds, only a 7 second improvement.
Here is my code:
#The program has some weird quirks
#(it doesn't print 3 and 5 because of some errors I was getting) but other than that
#it seems to work fine
#k is the number that we check up to
k <- 1000000
#n is the number of primes, initialized at 2
n <- 2
#We take only the odd numbers between 5 and k
y <- seq(5, k, 2)
#We take each member of i and check it for primality
for (i in y) {
#i is assumed to be prime until proved otherwise
prime <- TRUE
#We check the remainder when i is divided by every odd number less than its square root
for (j in c(2, seq(3, ceiling(sqrt(i)), 2))) {
if (i %% j == 0) {
#If it's found that some number divides i, we set prime to false and move on
#to the next i
prime <- FALSE
break
}
}
#If no number has divided i, we haven't broken the loop so R will get to this point
#We shouldn't need the if statement, but for some reason the n counter
#gets too big if I remove the statement
if (prime) {
print(i)
n <- n + 1
}
}
#Print out the number of primes
print(paste("There are", n, "prime numbers between", 1, "and", k))
In the version where I also remove multiples of 3, 5, and 7, I just define y as
y <- setdiff((setdiff((setdiff(seq(5, k, 2), seq(6, k, 3))), seq(10, k, 5))), seq(14, k, 7))
Sorry it's very messy, but it works.
Profile your code to find out where the bulk of the time is. As a general rule, don't use generics that have to determine their methods. Swap seq.int
for seq
. Especially in R, you can't just count the number of computations you think something will take and cut down on that. Actually look at how things are called.
Here's how to profile your code:
Call Rprof(tmp <- tempfile())
, then execute your algorithm, then call Rprof();summaryRprof(tmp)
. Look at the calls with a high total.pct
, and make your adjustments there.
If you want a fast function for quickly determining all the primes up to some number in R, here's one for you. I don't think I wrote it, but my file doesn't tell me where I got it. It's an implementation of the sieve of Eratosthenes.
sieve <- function(n){
n <- as.integer(n)
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
fsqr <- floor(sqrt(n))
while (last.prime <= fsqr){
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
sel <- which(primes[(last.prime+1):(fsqr+1)])
if(any(sel)){
last.prime <- last.prime + min(sel)
}else last.prime <- fsqr+1
}
which(primes)
}
EDIT: I most certainly didn't write this, though my memory made me think I might have implemented it in R from some other language, this is not the case. A quick google search of the code shows that I got it directly from a post here several years back when I was just a lurker. The first sieve answer there was posted by @GeorgeDontas: https://stackoverflow.com/a/3790309/7936744
EDIT 2: Now looking back through my files, I see my very minimal tweaks can eke a little bit more time out of this, and, probably unreliably, avoids a rare factor of 10 increase:
prime_logic <- function(n){
n <- as.integer(n)
primes <- rep_len(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
fsqr <- floor(sqrt(n))
while (last.prime <= fsqr){
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
sel <- which(primes[seq.int(last.prime+1, fsqr+1)])
if(any(sel)){
last.prime <- last.prime + min(sel)
}else last.prime <- fsqr+1
}
primes
}
Unit: milliseconds
expr min lq mean median uq max neval cld
sieve(1e+06) 29.80531 36.41955 50.24195 38.53144 41.86716 889.47422 100 a
prime_logic(1e+06) 22.19734 33.05792 39.35390 35.52268 41.02838 71.68033 100 a