Search code examples
haskellfunctional-programmingreturnmonadsreturn-type

How does return work within a Monad context within Haskell


Why does the following, given that the RegModule instance defines its return as: return a = RegModule (\_s _i d -> return (a, 0, d)), not return [((c, i+1, d), Int, d)], and why does the second case expression not need to be written with return []:

scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, i+1, d)
    [] -> []
  )

In MVE:

import qualified Data.Set as S

import Control.Monad

type CharSet = S.Set Char

data RE =
    RClass Bool CharSet

newtype RegModule d a =
  RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}

instance Monad (RegModule d) where
  return a = RegModule (\_s _i d -> return (a, 0, d))
  m >>= f =
    RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
                            (b, j', d'') <- runRegModule (f a) s (i + j) d'
                            return (b, j + j', d''))

instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap

scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, i+1, d)
    [] -> []
  )

regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
                )
regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
  next <- scanChar  
  if (S.member next cs)
    then return ()
    else regfail
 
runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
  let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
      (result2, pos2, newState2) = head $ runRegModule matcher input pos1 newState1
      (result3, pos3, newState3) = head $ runRegModule matcher input pos2 newState2
  in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3)]

Solution

  • It's a matter of type inference. In the code:

    scanChar :: RegModule d Char
    scanChar = RegModule (\s i d ->
      case drop i s of
        (c:cs) -> return (c, i+1, d)
        [] -> []
      )
    

    the RegModule constructor on the right hand side has type:

    RegModule :: (String -> Int -> d -> [(a, Int, d)]) -> RegModule d a
    

    Since scanChar has type signature RegModule d Char, Haskell unifies the types RegModule d Char (on the left) with RegModule d a (on the right), resulting in the unification of the type variable a with Char. Therefore, the expected type of the function passed to the RegModule constructor is:

    String -> Int -> d -> [(Char, Int, d)]
    

    Note that, so far, we've only considered the type signature for scanChar, and the type of the constructor RegModule. Nothing from the various instances for RegModule has been used.

    Anyway, the anonymous lambda \s i d -> ... gets unified with this type, so the arguments s, i, and d must have type String, Int, and d respectively, as you'd expect from their names, while the body must have type [(Char, Int, d)]. Since the body of the anonymous lambda is a case expression, every expression on the right-hand side of one of the cases must have that same type, [(Char, Int, d)]. So, we have:

    return (c, i+1, d) :: [(Char, Int, d)]  -- for the first case
    

    and:

    [] :: [(Char, Int, d)]                  -- for the second case
    

    The return function is polymorphic with type (Monad m) => t -> m t, so to unify this particular return expression with its type, Haskell unifies m with the list functor [] and t with the type of the argument passed to (c, i+1, d), namely (Char, Int, d). This instantiates return to the type for the [] monad:

    return :: t -> [t]
    

    and the list for elements of type (Char, Int, d) in particular:

    return :: (Char, Int, d) -> [(Char, Int, d)]
    

    So, it's the return function for the list monad that's used, and -- given its definition -- you could replace the code with:

    ...
    (c:cs) -> [(c, i+1, d)]
    ...
    

    and it would have the same meaning.

    On the other hand, if the [] on the RHS of the last case was replaced with return [], the compiler would attempt to unify the general type signature for return:

    return :: (Monad m) => t -> m t
    

    by unifying the argument type t with the type of the empty list [] :: [u] giving the type equality:

    t ~ [u]
    

    for some u, while at the same type unifying the type of the return [] expression m t with the expected type based on the type inference above, namely [(Char, Int, d)], giving the type equalities:

    m ~ []
    t ~ (Char, Int, d)
    

    Since the type equality t ~ [u] is incompatible with the type equality t ~ (Char, Int, d), this would be a type error:

    Return.hs:29:18-19: error:
        • Couldn't match expected type: (Char, Int, d)
                      with actual type: [a0]
    

    In the end, none of the type inference above uses the type of return for RegModule d monad.