I have a simple concurrent requirement: I have a piece of data (say file data from the fs), that should be loaded on demand, when a consumer tries to read it, and should be loaded only once. There are multiple concurrent consumers, that should get the same data, but only one ("the quickest") initiates the load, and all wait until it is loaded and consume it.
So the question is what is the best and typical approach in Haskell to implement this?
I understand that I need some lock to allow only once write multiple reads, so I can use a represend the holder of the data as a wrapper MVar, that is used to implement the lock mechanics. This var is empty at first, and the "first" consumer, will fill with the data var and load into the actual data, and the others will wait.
type WrapVar = (MVar (MVar MyData))
readData :: WrapVar -> FilePath -> IO MyData
readData wrapVar fname = do
mbDataVar <- tryReadMVar wrapVar
case mbDataVar of
Just dataVar -> do
-- if dataVar is here wait until it is loaded
readMVar dataVar
Nothing -> do
-- dataVar is not here, we create new empty one try to put it
dataVar <- newEmptyMVar
ok <- tryPutMVar wrapVar dataVar
if ok then
-- we load the actual data, it should be loaded only once,
-- because only one consumer can put dataVar into wrapVar
myData <- readFile fname
putMVar dataVar myData
pure myData
else
-- we failed to put Mvar (so some other consumer did it first)
-- so we may just recusivly ask the same read
readData fname
I wonder is this a good way to implement the described requirement or is there "more idiomatic" way to do this?
If you expect to write once but read many times, you might consider going the optimistic concurrency route and use STM
. Here's an idea of what that might look like.
data Status = Uninitialized | Initializing | Initialized MyData
-- | users should not call this themselves, it will be done for them by readData
initializeData :: TVar Status -> IO MyData
initializeData tvar = do
d <- readFile fname
d <$ atomically (writeTVar tvar (Initialized d))
readData :: TVar Status -> IO MyData
readData tvar = do
act <- atomically $ readTVar tvar >>= \case
Initialized d -> pure (pure d)
Initializing -> retry
Uninitialized -> initializeData tvar <$ writeTVar tvar Initializing
act
Generally, STM should be faster than MVars when contention is low, and contention is never lower than when there are only readers. If this turns out to be a bottleneck (...that seems unlikely to me), you can even add in a fastpath that uses readTVarIO
, succeeds when it gets an Initialized
, and falls back to this for the other values.