Search code examples
haskelllazy-evaluationthunk

Witnessing how frequently a value has been evaluated in Haskell


With a little bit of unsafe, you can see how much of a lazy value has been evaluated in Haskell

import Data.IORef
import System.IO.Unsafe

data Nat = Z | S Nat
  deriving (Eq, Show, Read, Ord)

natTester :: IORef Nat -> Nat
natTester ref =
  let inf = natTester ref
   in unsafePerformIO $ do
        modifyIORef ref S
        pure $ S inf

newNatTester :: IO (Nat, IORef Nat)
newNatTester = do
  ref <- newIORef Z
  pure (natTester ref, ref)

howMuchWasEvaled :: (Nat -> b) -> IO Nat
howMuchWasEvaled f = do
  (inf, infRef) <- newNatTester
  f inf `seq` readIORef infRef

With:

ghci> howMuchWasEvaled $ \x -> x > S (S Z)
S (S (S Z))

Indicating that only the first four constructors of infinity :: Nat were evaluated.

If x is used twice, we still get the total evaluation that was needed:

> howMuchWasEvaled $ \x -> x > Z && x > S (S Z)
S (S (S Z))

This makes sense - once we have evaluated x up to a point, we don't have to start all over again. The thunk has already been forced.

But is there a way of checking how many times constructors were evaluated? I.e., a function magic that behaves like this:

> magic $ \x -> x > Z 
S Z
> magic $ \x -> x > Z && x > Z
S (S Z)
...

I understand this may involve compiler flags (maybe no-cse), inline pragmas, very unsafe functions, etc.

EDIT: Carl pointed out that I may have been insufficiently clear on the constraints of what I was looking for. The requirement is that the function that magic is given as an argument cannot be altered (though it can be assumed to be lazy in its argument). magic would be part of a library that you could call with your own functions.

That said, GHC-specific hacks and things that only work unreliably are definitely still game.


Solution

  • As expressed, that cannot be done in ghc. Two uses of the same name, such as x in your example, will always be shared with ghc's implementation of haskell's evaluation model. That's a guarantee that provides a key building block for ensuring sharing is happening. At the very least, getting this to do what you want will require passing in multiple values, one for each independent place you want to use the named value.

    Then you'll have to ensure that on the calling side, the values aren't accidentally shared before being passed in to the function. This can be done, but it's the place that might require the use of options like -fno-cse or -fno-full-laziness, depending on how you implement it and what optimization level ghc is running at.

    Here's a small modification of your starting point that works in ghci, at least:

    {-# OPTIONS_GHC -fno-full-laziness #-}
    
    import Data.IORef
    import System.IO.Unsafe
    
    data Nat = Z | S Nat
        deriving (Eq, Show, Read, Ord)
    
    natTester :: IORef Nat -> Nat
    natTester ref =
        let inf = natTester ref
        in unsafePerformIO $ do
            modifyIORef ref S
            pure $ S inf
    
    newNatTester :: IO ((a -> Nat), IORef Nat)
    newNatTester = do
        ref <- newIORef Z
        pure (\x -> x `seq` natTester ref, ref)
    
    howMuchWasEvaled :: ((a -> Nat) -> b) -> IO Nat
    howMuchWasEvaled f = do
        (infGen, infRef) <- newNatTester
        f infGen `seq` readIORef infRef
    

    Use in ghci:

    *Main> howMuchWasEvaled $ \gen -> let x = gen 1 ; y = gen 2 in x > Z && y > Z
    S (S Z)
    *Main> howMuchWasEvaled $ \gen -> let x = gen 1 in x > Z && x > Z
    S Z
    

    I replaced passing a single infinity into the function with passing it an infinity generator. The generator doesn't care what argument it's called with, as long as it's not a bottom value. (The seq is there to be sure the function actually uses its argument, to prevent a number of optimizations ghc might go for if the argument was unused.) As long as it's called with a different value every time, ghc won't be able to cse it away, because the expressions are different. If used with optimizations, full laziness might interfere by floating natTester ref out of the lambda in newNatTester. To prevent that, I have added a pragma to turn off that optimization within this module. It doesn't matter in ghci by default, as it doesn't use optimizations. It might matter if this module gets compiled, so I threw the pragma in just to be sure.