What is the difference between a cyclic list and an infinite list in haskell?

Referencing @dfeuer's answer to this question: Least expensive way to construct cyclic list in Haskell, which says that using cyclic lists 'defeats' the garbage collector as it has to keep everything you've consumed from a cyclic list allocated till you drop the reference to any cons cells in the list.

Apparently in Haskell a cyclic list and an infinite list are two separate things. This blog (https://unspecified.wordpress.com/2010/03/30/a-doubly-linked-list-in-haskell/) says that if you implement cycle as follows:

cycle xs = xs ++ cycle xs

it is an infinite list, not a cyclic list. To make it cyclic you have to implement it like this (as is found in the Prelude source code):

cycle xs = xs' where xs' = xs ++ xs'

What exactly is the difference between these two implementations? And why is it that if you are holding onto one cons cell somewhere in a cyclic list, that the garbage collector has to keep everything before it allocated as well?

Solution

The difference is entirely in the memory representation. From the point of view of the semantics of the language, they're indistinguishable—you can't write a function that can tell them apart, so your two versions of cycle are considered two implementations of the same function (they're the exact same mapping of arguments to results). In fact, I don't know if the language definition guarantees that one of those is cyclical and the other infinite.

But anyway, let's bring out the ASCII art. Cyclical list:

   +----+----+                 +----+----+
   | x0 |   ----->   ...   --->| xn |    |
   +----+----+                 +----+-|--+
     ^                                |
     |                                |
     +--------------------------------+

Infinite list:

   +----+----+
   | x0 |   ----->  thunk that produces infinite list
   +----+----+

The thing with the cyclical list is that from every cons cell in the list there is a path to all of the others and itself. This means that from the point of view of the garbage collector, if one of the cons cells is reachable, then all are. In the plain infinite list, on the other hand, there aren't any cycles, so from a given cons cell only its successors are reachable.

Note that the infinite list representation is more powerful than the cyclical one, because the cyclical representation only works with lists that repeat after some number of elements. For example, the list of all prime numbers can be represented as an infinite list, but not as a cyclical one.

Note also that this distinction can be generalized into two distinct ways of implementing the fix function:

fix, fix' :: (a -> a) -> a
fix  f = let result = f result in result
fix' f = f (fix' f)

-- Circular version of cycle:
cycle  xs = fix (xs++)

-- Infinite list version of cycle:
cycle' xs = fix' (xs++)

The GHC libraries go for my fix definition. The way GHC compiles code means that the thunk created for result is used both as the result and the argument of the application of f. I.e., the thunk, when forced, will call the object code for f with the thunk itself as its argument, and replace the thunk's contents with the result.