Search code examples
haskellrecursionmemorylazy-evaluationmemoization

Performance difference of function and data recursion in Haskell


While thinking about this other question, I realized that the following functions smoothSeq and smoothSeq'

smoothSeq :: (Integer, Integer, Integer) -> [Integer]
smoothSeq (a, b, c) = result
  where
    result = 1 : union timesA (union timesB timesC)
    timesA = map (* a) $ result
    timesB = map (* b) $ result
    timesC = map (* c) $ result

smoothSeq' :: (Integer, Integer, Integer) -> [Integer]
smoothSeq' (a, b, c) = 1 : union timesA (union timesB timesC)
  where
    timesA = map (* a) $ smoothSeq' (a, b, c)
    timesB = map (* b) $ smoothSeq' (a, b, c)
    timesC = map (* c) $ smoothSeq' (a, b, c)

-- | Merge two sorted sequences discarding duplicates.
union :: (Ord a) => [a] -> [a] -> [a]
union [] ys = ys
union xs [] = xs
union (x : xs) (y : ys)
  | x < y = x : union xs (y : ys)
  | x > y = y : union (x : xs) ys
  | otherwise = x : union xs ys

have drastically different performance characteristics:

ghci> smoothSeq (2,3,5) !! 500
944784
(0.01 secs, 311,048 bytes)
ghci> smoothSeq' (2,3,5) !! 500
944784
(11.53 secs, 3,745,885,224 bytes)

My impression is that smoothSeq is linear in memory and time (as was regularSeq) because result is shared in the recursive definition, whereas smoothSeq' is super-linear because the recursive function definition spawns a tree of computations that independently recompute multiple times the previous terms of the sequence (there is no sharing/memoization of the previous terms; similar to naive Fibonacci).

While looking around for a detailed explanation, I encountered these examples (and others)

fix f = x where x = f x
fix' f = f (fix f)

cycle xs = res where res = xs ++ res
cycle' xs = xs ++ cycle' xs

where again the non-primed version (without ' suffix) is apparently more efficient because it reuses the previous computation.

From what I can see, what differentiates the two versions is whether the recursion involves a function or data (more precisely, a function binding or a pattern binding). Is that enough to explain the difference in behavior? What is the principle behind, that dictates whether something is memoized or not? I couldn't find a definite and comprehensive answer in the Haskell 2010 Language Report, or elsewhere.


Edit: here is another simple example that I could think of:

arithSeq start step = result
  where
    result = start : map (+ step) result
arithSeq' start step = start : map (+ step) (arithSeq' start step)
ghci> arithSeq 10 100 !! 10000
1000010
(0.01 secs, 1,443,520 bytes)
ghci> arithSeq' 10 100 !! 10000
1000010
(1.30 secs, 5,900,741,048 bytes)

The naive recursive definition arithSeq' is way worse than arithSeq, where the recursion "happens on data".


Solution

  • As a rule of thumb, when you bind something to a name all references to that name in the same scope are referring to a single object in memory. Whereas when you have multiple identical expressions (that are more complicated than a reference to a single name) they are different objects in memory that happen to have the same value.1

    So let's see how that general principle applies to your code:

    smoothSeq :: (Integer, Integer, Integer) -> [Integer]
    smoothSeq (a, b, c) = result
      where
        result = 1 : union timesA (union timesB timesC)
        timesA = map (* a) $ result
        timesB = map (* b) $ result
        timesC = map (* c) $ result
    

    Here there you have given the name result to the value 1 : union timesA (union timesB timesC). All the other places where result are used are in the same scope (the local scope within applications of smoothSeq), so there are referring to a single shared value. That means when timesA is evaluated some more and triggers some more evaluation of result, then timesB and timesC will subsequently see the results of that work already evaluated inside result.

    And in this one:

    smoothSeq' :: (Integer, Integer, Integer) -> [Integer]
    smoothSeq' (a, b, c) = 1 : union timesA (union timesB timesC)
      where
        timesA = map (* a) $ smoothSeq' (a, b, c)
        timesB = map (* b) $ smoothSeq' (a, b, c)
        timesC = map (* c) $ smoothSeq' (a, b, c)
    

    Here you don't have a single name for the expression smoothSeq' (a, b, c), you've just written that expression multiple times. Each of those expressions is an independent call, so they will be represented by a separate object in memory, and evaluating one will have no impact on the others.

    Furthermore the scope in which timesA, timesB, and timesC are defined is the scope inside the application of smoothSeq', after it has received its argument. That means that every application of smoothSeq' has its own scope with timesA, timesB, and timesC values, including the calls defining timesA, timesB, and timesC. So every call forks into 3 more calls, unlike the version with result which only entered smoothSeq once. This will be super-linear indeed.

    I think it's also worth comparing this to the similarly-structured code in your other question:

    regularSeq :: [Integer]
    regularSeq = 1 : union timesTwo (union timesThree timesFive)
      where
        timesTwo   = map (* 2) regularSeq
        timesThree = map (* 3) regularSeq
        timesFive  = map (* 5) regularSeq
    

    This is quite like smoothSeq, in that the 3 inner definitions are all working with a single name regularSeq and thus are referring to a single object. But here the scope of that name/object is quite different. In smoothSeq the shared variable result is defined inside the application of smoothSeq. Thus there is a separate result every time smoothSeq is called, but it is shared between all the internal definitions timesA, timesB, and timesC. In regularSeq, the shared name is a name in the global scope; it will thus be shared for the entire lifetime of the program.

    Hopefully this illustrates that the key thing to think about when trying to predict what is shared are the variables and their scopes.


    1 Technically the compiler may inline the definition of a name to its usage sites, and thus create distinct objects even when you used the same name. It also might apply common-subexpression-elimination so that multiple identical expressions end up referring to the same value. It could even notice that an expression in a local scope doesn't depend on the arguments of the function it occurs inside, and lift the expression outside to become a single object in memory shared across calls of the function. So if we're being precise you cannot be sure just by looking at the source code, and that's why there isn't any absolutely rigorous reference for what will be shared. The language semantics promise you a result, they don't promise you any particular evaluation strategy.

    However these transformations are applied as optimisations. The compiler will try to make sure it only makes these changes when it improves the performance of your code; the GHC developers would likely regard it as a compiler bug if either were applied and made a large negative difference in performance. So it's pretty safe to just think about it as "if you named it then it's the same object, otherwise they are different objects". Any deviation is unlikely to matter; the compiler might surprise you by running faster than you expected from the simple rule of thumb, but it probably won't violate your expectations in a way that creates a large problem. You generally only need to worry about exactly what it does if you're trying to squeeze every last bit of performance out of your code.