STM and atomically: why the semantic of these two programs differ?

Let's consider this simple Haskell program:

module Main where

import Control.Concurrent.STM
import Control.Concurrent
import Control.Exception
import Control.Monad
import Data.Maybe
import Data.Monoid
import Control.Applicative


terminator :: Either SomeException () -> IO ()
terminator r = print $ "Dying with " <> show r

doStuff :: TMVar () -> TChan () -> Int -> IO ()
doStuff writeToken barrier w = void $ flip forkFinally terminator $ do
  hasWriteToken <- isJust <$> atomically (tryTakeTMVar writeToken)
  case hasWriteToken of
    True -> do
      print $ show w <> "I'm the lead.."
      threadDelay (5 * 10^6)
      print "Done heavy work"
      atomically $ writeTChan barrier ()
    False -> do
      print $ show w <> " I'm the worker, waiting for the barrier..."
      myChan <- atomically $ dupTChan barrier
      _ <- atomically $ readTChan myChan
      print "Unlocked!"



main :: IO ()
main = do
  writeToken <- newTMVarIO ()
  barrier <- newBroadcastTChanIO
  _ <- forM [1..20] (doStuff writeToken barrier)
  threadDelay (20 * 10^6)
  return ()

It essentially model a concurrency scenario where a "lead" acquire the write token, do something and the workers will sync on a barrier and way for the "green light" from the lead. This works, but if we replace worker "atomically" block with this:

  _ <- atomically $ do
    myChan <- dupTChan barrier
    readTChan myChan

All the workers remains blocked indefinitely inside a STM transaction:

"Done heavy work" 
"Dying with Right ()"
"Dying with Left thread blocked indefinitely in an STM transaction"
"Dying with Left thread blocked indefinitely in an STM transaction"
"Dying with Left thread blocked indefinitely in an STM transaction"
...

I suspect the key lies inside the semantic of atomically. Any idea? Thanks! Alfredo

Solution

I think this behavior comes from the definition of dupTChan. Copied here for readability, along with readTChan

dupTChan :: TChan a -> STM (TChan a)
dupTChan (TChan _read write) = do
  hole <- readTVar write  
  new_read <- newTVar hole
  return (TChan new_read write)

readTChan :: TChan a -> STM a
readTChan (TChan read _write) = do
  listhead <- readTVar read
  head <- readTVar listhead
  case head of
    TNil -> retry
    TCons a tail -> do
    writeTVar read tail
    return a

inlining those functions, we get this STM block:

worker_block (TChan _read write) = do
  hole <- readTVar write
  new_read <- newTVar hole
  listhead <- readTVar new_read
  head <- readTVar listhead
  case head of
    TNil -> retry
    ...

When you try to run this block atomically, we make a new read_end from the tail of the channel, then call readTVar on it. The tail is of course empty, so this readTVar will retry. However, when the lead writes to the channel, the act of writing to the channel invalidates this transaction! So every follower transaction will always have to retry.

In fact, I don't think there is any case where dupTChan >>= readTChan will ever result in anything other than the thread being blocked indefinitely on an STM transaction. You can reason this out from the documentation as well. dupTChan begins empty, so within a single atomic transaction it will never have any items unless that same transaction adds them.