Search code examples
haskellconcurrency

Concurrent once write and multiple reads of the data


I have a simple concurrent requirement: I have a piece of data (say file data from the fs), that should be loaded on demand, when a consumer tries to read it, and should be loaded only once. There are multiple concurrent consumers, that should get the same data, but only one ("the quickest") initiates the load, and all wait until it is loaded and consume it.

So the question is what is the best and typical approach in Haskell to implement this?

I understand that I need some lock to allow only once write multiple reads, so I can use a represend the holder of the data as a wrapper MVar, that is used to implement the lock mechanics. This var is empty at first, and the "first" consumer, will fill with the data var and load into the actual data, and the others will wait.

type WrapVar = (MVar (MVar MyData))

readData :: WrapVar -> FilePath -> IO MyData
readData wrapVar fname = do
 mbDataVar <- tryReadMVar wrapVar
 case mbDataVar of
   Just dataVar -> do
      -- if dataVar is here wait until it is loaded
      readMVar dataVar
   Nothing -> do
     -- dataVar is not here, we create new empty one try to put it
     dataVar <- newEmptyMVar
     ok <- tryPutMVar wrapVar dataVar
     if ok then 
       -- we load the actual data, it should be loaded only once, 
       -- because only one consumer can put dataVar into wrapVar
       myData <- readFile fname
       putMVar dataVar myData
       pure myData
    else 
    -- we failed to put Mvar (so some other consumer did it first)
    -- so we may just recusivly ask the same read
    readData fname   


I wonder is this a good way to implement the described requirement or is there "more idiomatic" way to do this?


Solution

  • If you expect to write once but read many times, you might consider going the optimistic concurrency route and use STM. Here's an idea of what that might look like.

    data Status = Uninitialized | Initializing | Initialized MyData
    
    -- | users should not call this themselves, it will be done for them by readData
    initializeData :: TVar Status -> IO MyData
    initializeData tvar = do
        d <- readFile fname
        d <$ atomically (writeTVar tvar (Initialized d))
    
    readData :: TVar Status -> IO MyData
    readData tvar = do
        act <- atomically $ readTVar tvar >>= \case
            Initialized d -> pure (pure d)
            Initializing -> retry
            Uninitialized -> initializeData tvar <$ writeTVar tvar Initializing
        act
    

    Generally, STM should be faster than MVars when contention is low, and contention is never lower than when there are only readers. If this turns out to be a bottleneck (...that seems unlikely to me), you can even add in a fastpath that uses readTVarIO, succeeds when it gets an Initialized, and falls back to this for the other values.