haskell primes sieve-of-eratosthenes number-theory

Sieve of Eratosthenes in Haskell

I'm solving some classic problems in Haskell to develop my functional skills and I have a problem to implement an optimization suggested at this "Programming Praxis" site:

I have three solutions to this problem and the third one is too slow compared to the second solution. Can someone suggest some improvements to my code?

My implementations are:

-- primeira implementação
primes n
    | n < 2 = []
    | n == 2 = [2]
    | n `mod` 2 == 0 = primes'
    | otherwise = if (find (\x -> n `mod` x == 0) primes') == Nothing then
                      n:primes'
                  else
                      primes'
    where primes' = primes (n - 1)

-- segunda implementação
primes' :: Integer -> [Integer]
primes' n = sieve $ 2 : [3,5..n]
    where sieve :: [Integer] -> [Integer]
          sieve [] = []
          sieve l@(x:xs)
              | x*x >= n = l
              | otherwise = x : sieve list'
              where list' = filter (\y -> y `mod` x /= 0) xs

-- terceira implementação
primes'' :: Integer -> [Integer]
primes'' n = 2 : sieve 3 [3,5..n]
    where sieve :: Integer -> [Integer] -> [Integer]
          sieve _ [] = []
          sieve m l@(x:xs)
              | m*m >= n = l
              | x < m*m = x : sieve m xs
              | otherwise = sieve (m + 2) list'
              where list'= filter (\y -> y `mod` m /= 0) l

Solution

Looks to me like the problem with your third revision is how you choose the next element to sift on. You indiscriminately increment by 2. The problem is that you then sift on unnecessary numbers. for example, in this version your eventually going to pass 9 as m, and you're going to do an extra recursion to filter on 9, even though it isn't even in the list, and thus you should have never picked it in the first place (since it would have been removed in the very first filter on 3)

Even though the second version doesn't start the filtering past the square of the number it sifts on, it never chooses an unnecessary sifting value.

In other words, I think you end up sifting on every odd number between 3 and n. Instead you should be sifting on every odd number that hasn't already been removed by a previous pass.

I think to correctly implement the optimization of starting the sieve at the square of the current sift value, you have to retain the front of the list while sifting on the back where back contains the elements >= the square of the sift value. I think this would force you to use concatenations, and I'm not so sure that the optimization is good enough to cancel out the overhead induced by using ++.