Search code examples
haskellseq

Why haskell seq function makes first argument to WHNF?


I thinks that seq is hard to understand because it makes first argument to WHNF.

If seq makes first argument to normal form, it will be easy to understand.

What is the benefit of using WHNF over NF?

Why haskell committee choose using WHNF in seq function?


Solution

  • First, Haskell's API has no preference between WHNF and NF : it provides both seq and deepseq.

    So maybe your question is why does WHNF exist at all ? Simply put, it is the least evaluation possible. A full calculation would go like this :

    1. unevaluated (thunk)
    2. WHNF : outmost constructor known
    3. deeper evaluations
    4. NF : the value is completely calculated

    However, Haskell is lazy, it tries to do as little calculation as possible to get its final result. For example, to compute the length of a list l, Haskell must know whether l is [] or _ : t. Then recursively the same question about t. Therefore it only needs the number of constructors : before the final constructor []. This is by definition successive WHNFs, which only extract the outmost constructor of a value.

    The outmost constructor is the minimal information you must know about a value to actually do something with it, such as selecting a branch in a case of, like in length just above. That is the interest of WHNF over NF.

    Note that for simple types like Int, each value is its own constructor (2, -7, 15, ...), so WHNF = NF for them.

    Finally, you rarely care about all this : it is the compiler's job to execute your program as fast as possible. In the evaluation of foldl (+) 0 [1..100], if you try to figure out when the list [1..100] finishes in NF you will be completely wrong. The compiler transforms that fold into this tail-recursive loop

    let sum ret i =
                  if i == 100 then 100 + ret
                  else sum (ret+i) (i+1) in
      sum 0 1
    

    which means it doesn't evaluate the list at all. Unfortunately there are situations where the compiler cannot know that a value will always be in NF when the final result of the program is produced. Then it is a waste of time and RAM to keep it unevaluated (forming a thunk has a cost) : better deepseq it right from the start. Or your compiler may have performance bugs and you must help it with seq or deepseq. A profiler will tell you what parts of your code are slow.