Unexpected memory usage for cycle function

In the following program, I would only expect cycle3 to run with constant memory (it explicitly ties the knot). However, cycle2 also runs in constant memory for reasons I cannot fathom. I expect cycle2 to do the exact same work as cycle1 because

xs' = xs ++ xs'
xs' = xs ++ xs ++ xs' -- substitute value of xs'
xs' = xs ++ xs ++ xs ++ xs' -- substitute value of xs' again
xs' = xs ++ xs ++ xs ++ ... -- and so on

Can someone explain what I'm missing here?

module Main where

import System.Environment (getArgs)

cycle1 :: [a] -> [a]
cycle1 [] = error "empty list"
cycle1 xs = xs ++ cycle1 xs

cycle2 :: [a] -> [a]
cycle2 [] = error "empty list"
cycle2 xs = xs' where xs' = xs ++ xs'

cycle3 :: [a] -> [a]
cycle3 [] = error "empty list"
cycle3 xs = let
  xs' = go xs' xs
  in xs'
  where
    go :: [a] -> [a] -> [a]
    go start [last] = last : start
    go start (x:xs) = x : go start xs

testMem :: (Show a) => ([a] -> [a]) -> [a] -> IO ()
testMem f xs = print (xs', xs') -- prints only first val, need second val holds onto reference
  where
    xs' = f xs

main :: IO ()
main = do
  args <- getArgs
  let mCycleFunc = case args of
        ["1"] -> Just cycle1
        ["2"] -> Just cycle2
        ["3"] -> Just cycle3
        _ -> Nothing
  case mCycleFunc of
    Just cycleFunc -> testMem cycleFunc [0..8]
    Nothing -> putStrLn "Valid args are one of {1, 2, 3}."

Solution

It comes down to sharing or non-sharing of equal thunks. Two equal thunks are thunks that are guaranteed to produce the same result. In the case of cycle1, you're creating a new thunk for cycle1 xs each time you hit the [] at the end of xs. New memory needs to be allocated for that thunk, and its value needs to be computed from scratch, which allocates new list pairs as you go through it.

I think the way cycle2 avoids this becomes easier to understand if you rename xs' to result (and I removed the "error on []" case):

cycle2 :: [a] -> [a]
cycle2 xs = result 
    where result = xs ++ result

This definition is semantically equivalent to cycle1 (produces the same results for the same arguments), but the key to understanding the memory usage is to look at it in terms of what thunks get created. When you execute the compiled code for this function, all it does right away is create a thunk for result. You can think of the thunk as a mutable type more or less like this (total made-up pseudocode):

type Thunk a = union { NotDone (ThunkData a), Done a }
type ThunkData a = struct { force :: t0 -> ... -> tn -> a
                          , subthunk0 :: t0
                          , ...
                          , subthunkn :: tn }

This is either a record containing pointers to the thunks for the values needed plus a pointer to the code that forces these thunks, or just the result of the calculation. In the case of cycle2, the thunk for result points to the object code for (++) and the thunks for xs and result. This last bit means that the thunk for result has a pointer back to itself, which explains the constant space behavior; the last step in forcing result is to make it point back to itself.

In the case of cycle1, on the other hand the thunk has the code for (++), a thunk for xs, and a new thunk to calculate cycle1 xs from scratch. In principle it would be possible for the compiler to recognize that the reference to this latter thunk could be substituted with one to the "parent" chunk, but the compiler doesn't do that; whereas in cycle2 it can't help but do it (one instantiated binding of one variable = one chunk).

Note that this self-referential thunk behavior can be factored out into a suitable implementation of fix:

-- | Find the least fixed point of @f@.  This implementation should produce
-- self-referential thunks, and thus run in constant space.
fix :: (a -> a) -> a
fix f = result
    where result = f result

cycle4 :: [a] -> [a]
cycle4 xs = fix (xs++)