For disclosure, I'm rather new to Haskell and am figuring out the syntax. Let me illustrate my question with an example. Let's say you have a function which returns a tuple of several values or a data type with multiple fields.
let
tuple = functionWhichReturnsTuple
value1 = fst tuple
value2 = snd tuple
in
otherFunction value1 value2
Once the line in the in
block is executed, would functionWhichReturnsTuple
be called twice, once for value1
and once for value2
? If so, what would be a better method of extracting multiple values at once without initiating multiple function calls?
This is indeed a subtle point. The key facts to remember here are:
From the point of view of the result, i.e. the semantics, it is irrelevant how many times functionWhichReturnsTuple
is called: even if the functionWhichReturnsTuple
is called twice, it will always return the same result since Haskell is referentially transparent. This is the main feature of pure functional programming: functions always return the same result when called with the same arguments.
From the point of view of performance, instead, it does matter. Calling a function more times means recomputing the same result, so it can slow down the program dramatically. If that happens at every step of a recursive function, the complexity might blow up from linear to exponential.
Now, the compiler tries to optimize performance, hence it will prefer avoiding recomputing results when possible. Except in a few obscure cases, it will compute definitions x = ....
only once. Let's consider your code:
tuple = functionWhichReturnsTuple
value1 = fst tuple
value2 = snd tuple
In Haskell, computation is triggered from outside, when a value is demanded: it might be because of pattern matching, because of some strict function (like +
on Int
s), because we are going to print a value on screen, etc.
Let's say value1
is demanded. GHC then computes fst tuple
, which in turn will demand tuple
to be computed, forcing the evaluation of functionWhichReturnsTuple
. When that, say, returns (2,73)
GHC effectively rewrites the definition to store the result:
tuple = (2, 73)
value1 = 2
value2 = snd tuple -- still to be evaluated
In this way, if later on value2
is demanded as well, we can access the already-computed tuple
and get 73
. No recomputation of the function.
(Nitpick: above I neglected the fact that, because of laziness, tuples might have only one component evaluated. But that's not really relevant for this question.)
Finally, above I mentioned "a few obscure cases". There are cases where the optimizer can not call the function only once. That might happen when the function is polymorphic ("generic"). Consider this code:
dup :: a -> (a,a)
dup x = (x,x)
tuple :: forall a . Num a => (a,a)
tuple = dup (1+2)
value1 :: forall a . Num a => a
value1 = fst tuple
value2 :: forall a . Num a => a
value2 = snd tuple
Here, types are polymorphic (forall a . ...
), and it's perfectly possible that later on we choose two distinct types for value1
and value2
!
For instance in print (value1 :: Int, value2 :: Double)
.
In this case, we must compute (1+2)
both on Int
s and on Double
s. Optimization can not prevent that. dup
also has to be called twice.
The problem here is that polymorphic values may look like plain values but are functions in disguise. tuple
above acts like a function, expecting to be passed the type a
, and returning a tuple inside that type. Above we saw that the definition of tuple
was rewritten, but that only happened because it was a non-function value: GHC does not attempt to rewrite the definition of functions, since next time they might be called with different arguments.
Since polymorphic values are disguised functions, they do not get rewritten and they are recomputed at each use.
Lastly, note that the Haskell definition does not mandate that recomputing happens or that it does not happen. The GHC optimizer is completely free to do anything, so what we discussed above is not completely set in stone. Usually, the optimizer does a great job. It always avoids to recompute non-polymorphic values. In some cases, it might even avoid recomputing polymorphic ones, but we should not rely on that too much.