Search code examples
haskellstate-monad

Generate a sequential or random value each time a function is called


I need to make each instance of Sphere get a unique identifier so that no two Spheres are equal. I won't know ahead of time how many spheres I'll need to make so will need to make them one at a time, but still increment the identifier.

Most solutions I've tried have this issue where I end up with an IO a and need the unsafePerformIO to get the value.

This code comes close, but the resulting identifier is always the same:

module Shape ( Sphere (..)
             , sphere
             , newID
             ) where

import System.Random
import System.IO.Unsafe (unsafePerformIO)

data Sphere = Sphere { identifier :: Int
                     } deriving (Show, Eq)

sphere :: Sphere
sphere = Sphere { identifier = newID }

newID :: Int
newID = unsafePerformIO (randomRIO (1, maxBound :: Int))

This would work as well, and works great in the REPL, but when I put it in a function, it only returns a new value the first time and the same value after that.

import Data.Unique
sphere = Sphere { identifier = (hashUnique $ unsafePerformIO newUnique) }

I know think this all leads to the State Monad, but I don't understand that yet. Is there no other way that will "get the job done", without biting off all the other monad stuff?


Solution

  • First of all, don’t use unsafePerformIO here. It doesn’t do what you want anyway: it doesn’t “get the a out of an IO a”, since an IO a doesn’t contain an a; rather, unsafePerformIO hides an IO action behind a magical value that executes the action when somebody evaluates the value, which could happen multiple times or never because of laziness.

    Is there no other way that will "get the job done", without biting off all the other monad stuff?

    Not really. You’re going to have to maintain some kind of state if you want to generate unique IDs. (You may be able to avoid needing unique IDs altogether, but I don’t have enough context to say.) State can be handled in a few ways: manually passing values around, using State to simplify that pattern, or using IO.

    Suppose we want to generate sequential IDs. Then the state is just an integer. A function that generates a fresh ID can simply take that state as input and return an updated state. I think you’ll see straight away why that’s too simple, so we tend to avoid writing code like this:

    -- Differentiating “the next-ID state” from “some ID” for clarity.
    newtype IdState = IdState Id
    
    type Id = Int
    
    -- Return new sphere and updated state.
    newSphere :: IdState -> (Sphere, IdState)
    newSphere s0 = let
      (i, s1) = newId s0
      in (Sphere i, s1)
    
    -- Return new ID and updated state.
    newId :: IdState -> (Id, IdState)
    newId (IdState i) = (i, IdState (i + 1))
    
    newSpheres3 :: IdState -> ((Sphere, Sphere, Sphere), IdState)
    newSpheres3 s0 = let
      (sphere1, s1) = newSphere s0
      (sphere2, s2) = newSphere s1
      (sphere3, s3) = newSphere s2
      in ((sphere1, sphere2, sphere3), s3)
    
    main :: IO ()
    main = do
    
      -- Generate some spheres with an initial ID of 0.
      -- Ignore the final state with ‘_’.
      let (spheres, _) = newSpheres3 (IdState 0)
    
      -- Do stuff with them.
      print spheres
    

    Obviously this is very repetitive and error-prone, since we have to pass the correct state along at each step. The State type has a Monad instance that abstracts out this repetitive pattern and lets you use do notation instead:

    import Control.Monad.Trans.State (State, evalState, state)
    
    newSphere :: State IdState Sphere
    newSphere = do
      i <- newId
      pure (Sphere i)
    -- or:
    -- newSphere = fmap Sphere newId
    -- newSphere = Sphere <$> newId
    
    -- Same function as before, just wrapped in ‘State’.
    newId :: State IdState Id
    newId = state (\ (IdState i) -> (i, IdState (i + 1)))
    
    -- Much simpler!
    newSpheres3 :: State IdState (Sphere, Sphere, Sphere)
    newSpheres3 = do
      sphere1 <- newSphere
      sphere2 <- newSphere
      sphere3 <- newSphere
      pure (sphere1, sphere2, sphere3)
      -- or:
      -- newSpheres3 = (,,) <$> newSphere <*> newSphere <*> newSphere
    
    main :: IO ()
    main = do
    
      -- Run the ‘State’ action and discard the final state.
      let spheres = evalState newSpheres3 (IdState 0)
    
      -- Again, do stuff with the results.
      print spheres
    

    State is what I would reach for normally, since it can be used within pure code, and combined with other effects without much trouble using StateT, and because it’s actually immutable under the hood, just an abstraction on top of passing values around, you can easily and efficiently save and roll back states.

    If you want to use randomness, Unique, or make your state actually mutable, you generally have to use IO, because IO is specifically about breaking referential transparency like that, typically by interacting with the outside world or other threads. (There are also alternatives like ST for putting imperative code behind a pure API, or concurrency APIs like Control.Concurrent.STM.STM, Control.Concurrent.Async.Async, and Data.LVish.Par, but I won’t go into them here.)

    Fortunately, that’s very similar to the State code above, so if you understand how to use one, it should be easier to understand the other.

    With random IDs using IO (not guaranteed to be unique):

    import System.Random
    
    newSphere :: IO Sphere
    newSphere = Sphere <$> newId
    
    newId :: IO Id
    newId = randomRIO (1, maxBound :: Id)
    
    newSpheres3 :: IO (Sphere, Sphere, Sphere)
    newSpheres3 = (,,) <$> newSphere <*> newSphere <*> newSphere
    
    main :: IO ()
    main = do
      spheres <- newSpheres3
      print spheres
    

    With Unique IDs (also not guaranteed to be unique, but unlikely to collide):

    import Data.Unique
    
    newSphere :: IO Sphere
    newSphere = Sphere <$> newId
    
    newId :: IO Id
    newId = hashUnique <$> newUnique
    
    -- …
    

    With sequential IDs, using a mutable IORef:

    import Data.IORef
    
    newtype IdSource = IdSource (IORef Id)
    
    newSphere :: IdSource -> IO Sphere
    newSphere s = Sphere <$> newId s
    
    newId :: IdSource -> IO Id
    newId (IdSource ref) = do
      i <- readIORef ref
      writeIORef ref (i + 1)
      pure i
    
    -- …
    

    You’re going to have to understand how to use do notation and functors, applicatives, and monads at some point, because that’s just how effects are represented in Haskell. You don’t necessarily need to understand every detail of how they work internally in order to just use them, though. I got pretty far when I was learning Haskell with some rules of thumb, like:

    • A do statement can be:

      • An action: (action :: m a)

        • Often m () in the middle

        • Often pure (expression :: a) :: m a at the end

      • A let binding for expressions: let (var :: a) = (expression :: a)

      • A monadic binding for actions: (var :: a) <- (action :: m a)

    • f <$> action applies a pure function to an action, short for do { x <- action; pure (f x) }

    • f <$> action1 <*> action2 applies a pure function of multiple arguments to multiple actions, short for do { x <- action1; y <- action2; pure (f x y) }

    • action2 =<< action1 is short for do { x <- action1; action2 x }