Search code examples
haskellmemory-leaksfunctional-programmingatomicioref

How can atomicModifyIORef cause leaks? And why does atomicModifyIORef' solve the problem?


If I search for IORef a -> (a -> (a, b)) -> IO b on Hoogle, the first result is

atomicModifyIORef :: IORef a -> (a -> (a, b)) -> IO b

base Data.IORef

Atomically modifies the contents of an IORef.

This function is useful for using IORef in a safe way in a multithreaded program. If you only have one IORef, then using atomicModifyIORef to access and modify it will prevent race conditions.

Extending the atomicity to multiple IORefs is problematic, so it is recommended that if you need to do anything more complicated then using MVar instead is a good idea.

atomicModifyIORef does not apply the function strictly. This is important to know even if all you are doing is replacing the value. For example, this will leak memory:

ref <- newIORef '1'
forever $ atomicModifyIORef ref (\_ -> ('2', ()))

Use atomicModifyIORef' or atomicWriteIORef to avoid this problem.

This function imposes a memory barrier, preventing reordering; see Data.IORef#memmodel for details.

(I'm not sure why, if I click on any of the three links, the resulting doc page doesn't seem to contain the text will leak memory, which is contained in the excerpt above.)

The question is two folds:

  • Why is the above example leaking memory?
  • Why the leak doesn't happen if atomicModifyIORef' is used instead of atomicModifyIORef?

Solution

  • Let's run the code.

    ref <- newIORef '1'
    

    After this line, the contents of the IORef is just '1'.

    Let's apply once this action:

    atomicModifyIORef ref (\_ -> ('2', ()))
    

    After this line, the contents of the IORef is (\_ -> '2') '1'. Note that, because of laziness, this is not simplified to '2', but is kept as an unevaluated thunk. (atomicModifyIORef' would instead simplify that.)

    Once more, let's apply this action:

    atomicModifyIORef ref (\_ -> ('2', ()))
    

    Now the contents of the IORef is (\_ -> '2') ((\_ -> '2') '1').

    And so on, see the pattern? We build larger and larger unevaluated thunks, wasting memory, when we could (and should) simplify its contents.