I have been trying to encode an algorithm in Haskell that requires using lots of mutable references, but it is (perhaps not surprisingly) very slow in comparison to purely lazy code. Consider a very simple example:
module Main where
import Data.IORef
import Control.Monad
import Control.Monad.Identity
list :: [Int]
list = [1..10^6]
main1 = mapM newIORef list >>= mapM readIORef >>= print
main2 = print $ map runIdentity $ map Identity list
Running GHC 7.8.2 on my machine, main1
takes 1.2s and uses 290MB of memory, while main2
takes only 0.4s and uses a mere 1MB. Is there any trick to prevent this growth, especially in space? I often need IORef
s for non-primitive types unlike Int
, and assumed that an IORef
would use an additional pointer much like a regular thunk, but my intuition seems to be wrong.
I have already tried a specialized list type with an unpacked IORef
, but with no significant difference.
The problem is your use of mapM
, which always performs poorly on large lists both in time and space. The correct solution is to fuse away the intermediate lists by using mapM_
and (>=>)
:
import Data.IORef
import Control.Monad
list :: [Int]
list = [1..10^6]
main = mapM_ (newIORef >=> readIORef >=> print) list
This runs in constant space and gives excellent performance, running in 0.4 seconds on my machine.
Edit: In answer to your question, you can also do this with pipes
to avoid having to manually fuse the loop:
import Data.IORef
import Pipes
import qualified Pipes.Prelude as Pipes
list :: [Int]
list = [1..10^6]
main = runEffect $
each list >-> Pipes.mapM newIORef >-> Pipes.mapM readIORef >-> Pipes.print
This runs in constant space in about 0.7 seconds on my machine.