Search code examples
performancehaskellconcatenationlazy-evaluationassociativity

Concatenation order and performance under Lazy Evaluation in Haskell


Consider the following two execution orders:

a ++ (b ++ c)

and

(a ++ b) ++ c

Why is the first execution order faster than the second?

I'm new to Haskell, hoping for a detailed explanation, thanks!


Solution

  • When fully evaluating the result of x ++ y, you must pay:

    1. The cost of fully evaluating x.
    2. The cost of fully evaluating y.
    3. The cost of traversing x once while executing the (++) operation.

    Now let's compare a ++ (b ++ c) and (a ++ b) ++ c. Let's write the cost of evaluating a as CA (and similarly CB, CC), and the cost of traversing a as TA (and similarly TB, TC). Then the cost of fully evaluating a ++ (b ++ c) is:

    1. The cost of a, CA.
    2. The cost of b ++ c, which is
      1. CB
      2. CC
      3. One traversal of b, TB.
    3. TA

    This is a grand total of CA+CB+CC+TA+TB. Now, for (a ++ b) ++ c:

    1. The cost of a ++ b, which is
      1. CA
      2. CB
      3. TA
    2. CC
    3. One traversal of a ++ b, which is TA+TB.

    This is a grand total of CA+CB+CC+2*TA+TB. Compared to the other order, there is an extra traversal of a, for an extra cost of TA, so this order is more expensive.

    I leave it to the reader to try longer chains and start figuring out the pattern. In short, the bad association gives a quadratic amount of traversal as it redoes all the traversals it has already done, plus one more, at each invocation of (++), while the good association traverses each list at most once.