performance haskell memory lazy-evaluation fold

Do Haskell’s strict folds really use linear space?

I thought I understood the basics of fold performance in Haskell, as described in foldr, foldl, foldl' on the Haskell Wiki and many other places. In particular, I learned that for accumulating functions, one should use foldl', to avoid space leaks, and that the standard library functions are written to respect this. So I presumed that simple accumulators like length, applied to simple lists like replicate n 1, should require constant space (or at least sub-linear) in the length of the list. My intuition was that on sufficiently simple lists, they would behave roughly like a for loop in an imperative language.

But today I found that this seems not to hold in practice. For instance, length $ replicate n 1 seems to use space linear in n. In ghci:

ghci> :set +s
ghci> length $ replicate (10^6) 1
1000000
(0.02 secs, 56,077,464 bytes)
ghci> length $ replicate (10^7) 1
10000000
(0.08 secs, 560,078,360 bytes)
ghci> length $ replicate (10^8) 1
100000000
(0.61 secs, 5,600,079,312 bytes)
ghci> length $ replicate (10^9) 1
1000000000
(5.88 secs, 56,000,080,192 bytes)

Briefly, my question is: Do length and other strict folds really use linear space? If so, why? And is it inevitable? Below are more details of how I’ve played around trying to understand this, but they’re probably not worth reading — the tl;dr is that the linear-space usage seems to persist whatever variations I try.

(I originally used sum as the example function. As Willem Van Onsem points out, that was a badly-chosen example as default instances aren’t actually strict. However, the main question remains, since as noted below, this occurs with plenty of other functions that really are based on strict folds.)

Replacing length with foldl' (\n _ -> n+1) 0 appears to make performance worse by a constant factor; space usage still seems to be linear.
Versions defined with foldl and foldr had worse memory usage (as expected), but only by a small constant factor, not asymptotically worse (as most discussions seem to suggest).
Replacing length with sum, last, or other simple accumulators, or with the obvious definitions of these using foldl', also doesn’t seem to change the linear space usage.
Using [1..n] as the test list, and other similar variations, also seems to make no significant difference.
Switching between the general versions of sum, foldl', etc from Data.Foldable, the specialised ones in Data.List, and local versions defined directly by pattern-matching, also seems to make no difference.
Compiling instead of working in ghci also only seemed to improve space usage by a constant factor.
Switching between several recent versions of GHC — 8.8.4, 8.10.5, and 9.0.1 — also seemed to make no significant difference.

Solution

"Do they use linear space" is a slightly unclear question. Usually when we talk about the space an algorithm uses, we're talking about its working set: the maximum amount of memory it needs all at once. "If my computer only had X bytes of memory, could I run this program?" But that's not what GHCI's :set +s measures. It measures the sum of all memory allocations made, including those that were cleaned up partway through. And what is the biggest use of memory in your experiment? The list itself, of course.

So you've really just measured the number of bytes that a list of size N takes up. You can confirm this by using last instead of length, which I hope you'll agree allocates no intermediate results, and is strict. It takes the same amount of memory using your metric as length does - length does no extra allocation for the sums.

But a bigger problem is that GHCI is not an optimizing compiler. If you care about performance characteristics at all, GHCI is the wrong tool. Instead, use GHC with -O2, and turn on GHC's profiler.

import System.Environment (getArgs)

main = do
  n <- read . head <$> getArgs
  print $ length (replicate (10^n) 1)

And running it:

$ ghc -O2 -prof -fprof-auto stackoverflow.hs
$ ./stackoverflow 6 +RTS -p
1000000
$ grep "total alloc" stackoverflow.prof
    total alloc =      54,856 bytes  (excludes profiling overheads)
$ ./stackoverflow 9 +RTS -p
1000000000
$ grep "total alloc" stackoverflow.prof
    total alloc =      55,008 bytes  (excludes profiling overheads)

we can see that space usage is roughly constant despite a thousand-fold increase in input size.

Will Ness correctly points out in a comment that -s would be a better measuring tool than -p.