Search code examples
scalaout-of-memorysieve-of-eratosthenes

Scala performance - Sieve


Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .

My implementation is:

import scala.math

object Sieve {

    // Returns all prime numbers until maxNum
    def getPrimes(maxNum : Int) = {
        def sieve(list: List[Int], stop : Int) : List[Int] = {
            list match {
                case Nil => Nil
                case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
                case _ => list
            }
        }
        val stop : Int = math.sqrt(maxNum).toInt
        sieve((2 to maxNum).toList, stop)
    }

    def main(args: Array[String]) = {
        val ap = printf("%d ", (_:Int)); 

        // works
        getPrimes(1000).foreach(ap(_))

        // works 
        getPrimes(100000).foreach(ap(_))

        // out of memory
        getPrimes(1000000).foreach(ap(_))

    }
}

Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .

Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .

PS: I've done something very similar in Haskell, and there I didn't encountered any issues .


Solution

  • I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.

    import Stream._
    
    val primes = 2 #:: sieve(3)
    
    def sieve(n: Int) : Stream[Int] =
          if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
          else n #:: sieve(n + 2)
    
    def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)
    

    Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.