Search code examples
rprimesdata-analysisany

Implementing function to detect if a number is prime using any() in R


Was exploring ways to avoid the for loop and instead make use of the any() function to implement a function that gives true when the passed argument is prime and false other wise.

Here is what I have:

prime2 <- function(n) {
  rangeOfNumbers <- range(2:(n-1))
  if(any(n%%rangeOfNumbers == 0)){
    return(FALSE)
  }
  else return(TRUE)
}

Looks straight forward, but prime(55) gives TRUE instead of false.

What am I doing wrong?


Solution

  • As you have stated, we can easily call an isprime function from many packages to obtain your answer quickly, but that is not your aim here. You are learning and trying to understand basic principals in R and programming in general. Many kudos!!

    As many have pointed out, the use of range here is inappropriate in this situation. This function simply returns the minimum and maximum of a given vector (See ?range for more info). So for your example of 55, since range(2:54) returns 2 and 54, 55 is not divisible by either of these numbers, hence the return of TRUE.

    As noted by @LAP, you need the colon : operator. So if you change range(2:(n-1)) to simply 2:(n-1), your algorithm will work (for the most part... it won't work for n <= 2).

    Since you state that you are in the process of learning, you should also think about ways of coding in a more efficient and thoughtful manner. In order to check whether a number is prime, we only need to check up to the square root of n. This will reduce many checks.

    Let's also think a little harder on how we can avoid further checks. We know when checking for primality, we only need to consider prime numbers. Since 2 is the only even prime number, we could first check if the input is even, and if it isn't, only check divisibility by odd numbers only. We can achieve the last bit by generating a sequence with fixed steps by calling the function seq or seq.int.

    Let's see this in action:

    ## Pseudo-Corrected OP function (2 should return TRUE, 1 should return FALSE, etc.)
    prime2 <- function(n) {
        rangeOfNumbers <- 2:(n-1)
        if(any(n%%rangeOfNumbers == 0)){
            return(FALSE)
        }
        else return(TRUE)
    }
    
    prime2Enhanced <- function(n) {
        if (n < 2)
            return(FALSE)
        else if (n %in% c(2, 3, 5, 7))
            return(TRUE)
        else if (n %% 2 == 0)
            return(FALSE)
        else if (any(n %% seq.int(3, sqrt(n), 2) == 0))
            return(FALSE)
        else
            return(TRUE)
    }
    
    all.equal(sapply(3:1000, prime2), as.logical(gmp::isprime(3:1000)))
    [1] TRUE
    
    all.equal(sapply(3:1000, prime2Enhanced), as.logical(gmp::isprime(3:1000)))
    [1] TRUE
    

    We can do even better by looking at extending our programming concepts to vectorization. This is when we can pass a vector and get the result all at one time instead of using a loop. This is a very important concept in R and I highly recommend learning about it. The R Inferno is an excellent resource for such a topic (see Circle 3).

    prime2Vectorized <- function(v) {
        result <- rep(TRUE, length(v))
        testVec <- as.integer(c(2, seq(3, sqrt(max(v)), 2)))
    
        ## Although we are using a loop here, we are taking
        ## advantage of vectorization concepts with each x
        for (x in testVec) {
            s <- which(v >= x^2)
            result[s[v[s] %% x == 0]] <- FALSE
        }
    
        result
    }
    
    all.equal(prime2Vectorized(3:1000), as.logical(gmp::isprime(3:1000)))
    [1] TRUE
    

    I will save it as an exercise to handle edge cases and further optimization.

    Now, let's see how much improvements we have in efficiency using the library microbenchmark:

    library(microbenchmark)
    microbenchmark(orig = sapply(3:1000, prime2),
                     improved = sapply(3:1000, prime2Enhanced),
                     vectorize = prime2Vectorized(3:1000))
    Unit: microseconds
         expr      min        lq      mean    median       uq       max neval
         orig 3863.279 4198.5595 5470.2178 4403.5050 4723.485 15775.377   100
     improved 1670.418 1740.1240 1937.2396 1809.6680 1935.365  9912.629   100
    vectorize  202.006  218.5505  306.4068  233.9045  249.696  6243.121   100
    

    Hopefully you can see that there are many fundamental concepts that can be taken away from this simple exercise. If you really want to learn about primality tests, you should check out Miller-Rabin primality test which is implemented in gmp, numbers, and other packages.