First up, despite this being an implementation of the Sieve of Eratosthenes this is not a homework question. I can find implementations of the Sieve in many intro books :).
The question I have is that around my confusion around referential transparency, which was one of the virtues of functional programming.
My understanding was that I could replace f(x)
anywhere in my program with y
, provided y = f(x)
and I was in a pure function.
A naive implementation of the n-th prime using the Sieve "by hand" is
nums0 = [2..] -- allowed by lazy evaluation
nums1 = [n | n <- nums0, mod n (head nums0) /= 0] -- works, gives [3,5,7,9,11,13,15...]
nums2 = [n | n <- nums1, mod n (head nums1) /= 0] -- works, gives [5,7,11,13,...]
nums3 = [n | n <- nums2, mod n (head nums2) /= 0] -- works, gives [7,11,13,...]
It is clear that there is a recursive pattern here. Let's call ndnp m
the function that returns the list of natural numbers greater than 1 that are not divisible by the first m
primes i.e.
nums0 = ndnp 0 -- (i.e. all numbers starting at 2)
nums1 = ndnp 1 -- (not divisible by 2)
nums2 = ndnp 2 -- (not divisible by 2 or 3)
To find the Nth prime, we can simply do head $ ndnp (n-1)
.
The example of constructing the numsX
array has a recursive structure:
-- ndnp: Not Divisible by first N Primes
ndnp :: Integer -> [Integer]
ndnp 0 = [2..]
ndnp n = [n | n <- (ndnp (n-1)), mod n (head $ ndnp (n-1)) /= 0]
In particular, ndnp 1
is built from ndnp 0
or nums0
When run take 10 $ ndnp 0
I get the answer I expect. When I run take 10 $ ndnp 1
I get a stackoverflow.
My question:
a = ndnp 1
is legal. Evaluation is lazy, which I understand.take
, I get a stack overflow (which suggests it is trying to evaluate the whole structure). I could understand if mod
or something forced eager evaluation, but I know this isn't the case because ...take 10 nums1
works just fine, and it is build off the same steps of looping through an infinite, lazily eval'ed list!nums0 = ndnp 0
, why is the execution of take 10 $ nums1
fine (which uses nums0
and iterates over it), but take 10 $ ndnp 1
broken (which uses the result of ndnp 0
and iterates over it)? Referrential transparency suggests I should be able to swap one out for the other, but in one case I retain lazy evaluation and the other generates a stack overflow!Note: There is a fix for this I found
ndnp :: Integer -> [Integer]
ndnp 0 = [2..]
ndnp n = [n | n <- arr, mod n (head $ arr) /= 0]
where arr = ndnp (n-1)
which makes my code work, but doesn't give me a good conceptual understanding on when I have referential transparency and when I don't.
If you have warnings turned on (which is good practice -- it's kind of a mystery to me that so many compilers, GHC included, default to not printing warnings), then you'll see something like:
Referential.hs:6:15: warning: [-Wname-shadowing]
This binding for ‘n’ shadows the existing binding
bound at Referential.hs:6:6
|
6 | ndnp n = [n | n <- (ndnp (n-1)), mod n (head $ ndnp (n-1)) /= 0]
^
which will clue you in on what's going wrong. The Haskell scoping rules mean that your original definition of ndnp
is equivalent to:
ndnp n = [m | m <- (ndnp (n-1)), mod m (head $ ndnp (m-1)) /= 0]
^ ^ ^ ^
That is, your new definition of n
doesn't capture the n
in the first occurrence of ndnp (n-1)
, but it does capture all the other n
s in the comprehension.
When you try to evaluate ndnp 1
you get:
ndnp 1 = [m | m <- ndnp 0, mod m (head $ ndnp (m-1)) /= 0]
and the list comprehension starts with m = 2
which calls ndnp (m-1) = ndnp 1
within the guard, resulting in an infinite loop.
You can fix the problem by renaming the newly introduced iteration variable in the list comprehension:
ndnp n = [n' | n' <- (ndnp (n-1)), mod n' (head $ ndnp (n-1)) /= 0]
Observe that substituting n = 1
gives:
ndnp 1 = [n' | n' <- ndnp 0, mod n' (head $ ndnp 0) /= 0]
and then substituting ndnp 0 = nums0
gives:
ndnp 1 = [n' | n' <- nums0, mod n' (head nums0) /= 0]
which precisely matches your definition of nums1
up to variable substitution.
So, there's no violation of referential transparency in this example.
The fix you found:
ndnp n = [n | n <- arr, mod n (head $ arr) /= 0]
where arr = ndnp (n-1)
is scoped differently. The Haskell scoping rules mean that it's equivalent to:
ndnp n = [m | m <- arr, mod m (head $ arr) /= 0]
^ ^ ^
where arr = ndnp (n-1)
See how the new variable m
captures every occurrence of m
in the list comprehension, but doesn't capture the n
in the definition of arr
? This means that this is equivalent to:
ndnp n = [m | m <- ndnp (n-1), mod m (head $ ndnp (n-1)) /= 0]
^ ^ ^
where neither ndnp (n-1)
is captured, making it equivalent to my "fixed" version above, up to variable renaming:
ndnp n = [n' | n' <- (ndnp (n-1)), mod n' (head $ ndnp (n-1)) /= 0]
^^ ^^ ^^