Search code examples
haskellreturn-valuefunction-call

Does extracting values from a multiple Value return in Haskell invoke the function more than once?


For disclosure, I'm rather new to Haskell and am figuring out the syntax. Let me illustrate my question with an example. Let's say you have a function which returns a tuple of several values or a data type with multiple fields.

let
    tuple = functionWhichReturnsTuple
    value1 = fst tuple
    value2 = snd tuple
in
    otherFunction value1 value2

Once the line in the in block is executed, would functionWhichReturnsTuple be called twice, once for value1 and once for value2? If so, what would be a better method of extracting multiple values at once without initiating multiple function calls?


Solution

  • This is indeed a subtle point. The key facts to remember here are:

    1. From the point of view of the result, i.e. the semantics, it is irrelevant how many times functionWhichReturnsTuple is called: even if the functionWhichReturnsTuple is called twice, it will always return the same result since Haskell is referentially transparent. This is the main feature of pure functional programming: functions always return the same result when called with the same arguments.

    2. From the point of view of performance, instead, it does matter. Calling a function more times means recomputing the same result, so it can slow down the program dramatically. If that happens at every step of a recursive function, the complexity might blow up from linear to exponential.

    Now, the compiler tries to optimize performance, hence it will prefer avoiding recomputing results when possible. Except in a few obscure cases, it will compute definitions x = .... only once. Let's consider your code:

    tuple = functionWhichReturnsTuple
    value1 = fst tuple
    value2 = snd tuple
    

    In Haskell, computation is triggered from outside, when a value is demanded: it might be because of pattern matching, because of some strict function (like + on Ints), because we are going to print a value on screen, etc. Let's say value1 is demanded. GHC then computes fst tuple, which in turn will demand tuple to be computed, forcing the evaluation of functionWhichReturnsTuple. When that, say, returns (2,73) GHC effectively rewrites the definition to store the result:

    tuple = (2, 73)
    value1 = 2
    value2 = snd tuple  -- still to be evaluated
    

    In this way, if later on value2 is demanded as well, we can access the already-computed tuple and get 73. No recomputation of the function.

    (Nitpick: above I neglected the fact that, because of laziness, tuples might have only one component evaluated. But that's not really relevant for this question.)

    Finally, above I mentioned "a few obscure cases". There are cases where the optimizer can not call the function only once. That might happen when the function is polymorphic ("generic"). Consider this code:

    dup :: a -> (a,a)
    dup x = (x,x)
    
    tuple :: forall a . Num a => (a,a)
    tuple = dup (1+2)
    value1 :: forall a . Num a => a
    value1 = fst tuple
    value2 :: forall a . Num a => a
    value2 = snd tuple
    

    Here, types are polymorphic (forall a . ...), and it's perfectly possible that later on we choose two distinct types for value1 and value2!

    For instance in print (value1 :: Int, value2 :: Double).

    In this case, we must compute (1+2) both on Ints and on Doubles. Optimization can not prevent that. dup also has to be called twice.

    The problem here is that polymorphic values may look like plain values but are functions in disguise. tuple above acts like a function, expecting to be passed the type a, and returning a tuple inside that type. Above we saw that the definition of tuple was rewritten, but that only happened because it was a non-function value: GHC does not attempt to rewrite the definition of functions, since next time they might be called with different arguments. Since polymorphic values are disguised functions, they do not get rewritten and they are recomputed at each use.

    Lastly, note that the Haskell definition does not mandate that recomputing happens or that it does not happen. The GHC optimizer is completely free to do anything, so what we discussed above is not completely set in stone. Usually, the optimizer does a great job. It always avoids to recompute non-polymorphic values. In some cases, it might even avoid recomputing polymorphic ones, but we should not rely on that too much.