Strict evaluation techniques for concurrent channels in Haskell

I'm toying with Haskell threads, and I'm running into the problem of communicating lazily-evaluated values across a channel. For example, with N worker threads and 1 output thread, the workers communicate unevaluated work and the output thread ends up doing the work for them.

I've read about this problem in various documentation and seen various solutions, but I only found one solution that works and the rest do not. Below is some code in which worker threads start some computation that can take a long time. I start the threads in descending order, so that the first thread should take the longest, and the later threads should finish earlier.

import Control.Concurrent (forkIO)
import Control.Concurrent.Chan   -- .Strict
import Control.Concurrent.MVar
import Control.Exception (finally, evaluate)
import Control.Monad (forM_)
import Control.Parallel.Strategies (using, rdeepseq)

main = (>>=) newChan $ (>>=) (newMVar []) . run

run :: Chan (Maybe String) -> MVar [MVar ()] -> IO ()
run logCh statVars = do
  logV <- spawn1 readWriteLoop
  say "START"
  forM_ [18,17..10] $ spawn . busyWork
  await
  writeChan logCh Nothing -- poison the logger
  takeMVar logV
  putStrLn "DONE"
  where
    say mesg = force mesg >>= writeChan logCh . Just

    force s = mapM evaluate s  -- works
--    force s = return $ s `using` rdeepseq  -- no difference
--    force s = return s -- no-op; try this with strict channel

    busyWork = say . show . sum . filter odd . enumFromTo 2 . embiggen
    embiggen i = i*i*i*i*i

    readWriteLoop = readChan logCh >>= writeReadLoop
    writeReadLoop Nothing = return ()
    writeReadLoop (Just mesg) = putStrLn mesg >> readWriteLoop

    spawn1 action = do
      v <- newEmptyMVar
      forkIO $ action `finally` putMVar v ()
      return v

    spawn action = do
      v <- spawn1 action
      modifyMVar statVars $ \vs -> return (v:vs, ())

    await = do
      vs <- modifyMVar statVars $ \vs -> return ([], vs)
      mapM_ takeMVar vs

Using most techniques, the results are reported in the order spawned; that is, the longest-running computation first. I interpret this to mean that the output thread is doing all the work:

-- results in order spawned (longest-running first = broken)
START
892616806655
503999185040
274877906943
144162977343
72313663743
34464808608
15479341055
6484436675
2499999999
DONE

I thought the answer to this would be strict channels, but they didn't work. I understand that WHNF for strings is insufficient because that would just force the outermost constructor (nil or cons for the first character of the string). The rdeepseq is supposed to fully evaluate, but it makes no difference. The only thing I've found that works is to map Control.Exception.evaluate :: a -> IO a over all the characters in the string. (See the force function comments in the code for several different alternatives.) Here's the result with Control.Exception.evaluate:

-- results in order finished (shortest-running first = correct)
START
2499999999
6484436675
15479341055
34464808608
72313663743
144162977343
274877906943
503999185040
892616806655
DONE

So why don't strict channels or rdeepseq produce this result? Are there other techniques? Am I misinterpreting why the first result is broken?

Solution

There are two issues going on here.

The reason the first attempt (using an explicit rnf) doesn't work is that, by using return, you've created a thunk that fully evaluates itself when it is evaluated, but the thunk itself has not being evaluated. Notice that the type of evaluate is a -> IO a: the fact that it returns a value in IO means that evaluate can impose ordering:

return (error "foo")   >> return 1 == return 1
evaluate (error "foo") >> return 1 == error "foo"

The upshot is that this code:

force s = evaluate $ s `using` rdeepseq

will work (as in, have the same behavior as mapM_ evaluate s).

The case of using strict channels is a little trickier, but I believe this is due to a bug in strict-concurrency. The expensive computation is actually being run on the worker threads, but it's not doing you much good (you can check for this explicitly by hiding some asynchronous exceptions in your strings and seeing which thread the exception surfaces on).

What's the bug? Let's take a look at the code for strict writeChan:

writeChan :: NFData a => Chan a -> a -> IO ()
writeChan (Chan _read write) val = do
  new_hole <- newEmptyMVar
  modifyMVar_ write $ \old_hole -> do
    putMVar old_hole $! ChItem val new_hole
    return new_hole

We see that modifyMVar_ is called on write before we evaluate the thunk. The sequence of operations then is:

writeChan is entered
We takeMVar write (blocking anyone else who wants to write to the channel)
We evaluate the expensive thunk
We put the expensive thunk onto the channel
We putMVar write, unblocking all of the other threads

You don't see this behavior with the evaluate variants, because they perform the evaluation before the lock is acquired.

~~I’ll send Don mail about this and see if he agrees that this behavior is kind of suboptimal.~~

Don agrees that this behavior is suboptimal. We're working on a patch.