Search code examples
haskellfunctional-programmingmonadsinterpretermonad-transformers

Running Monad multiple times in haskell and updating state


Hi for the following code I have tried to run the RegModule monad such that one state will effect the next, as seen in the output at the end of the code (MVE) below the state of the monad doesn't seem to be updated as the increment only works for the first call to the monad rather than increment on each of the runRegModule calls. How can I ensure that its state get updated. And is there a way to do it that would be less manual such that the Monad would be called until it fails using forM or the like.

import qualified Data.Set as S

import Control.Monad

type CharSet = S.Set Char

data RE =
    RClass Bool CharSet

newtype RegModule d a =
  RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}

instance Monad (RegModule d) where
  return a = RegModule (\_s _i d -> return (a, 0, d))
  m >>= f =
    RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
                            (b, j', d'') <- runRegModule (f a) s (i + j) d'
                            return (b, j + j', d''))

instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap

scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, 1, d)
    [] -> []
  )

regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
                )
regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
  next <- scanChar  
  if (S.member next cs)
    then return ()
    else regfail
 
runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
  let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
      (result2, pos2, newState2) = head $ runRegModule matcher input pos1 newState1
      (result3, pos3, newState3) = head $ runRegModule matcher input pos2 newState2
  in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3)]

OUTPUT:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

Expected result:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),2,[]),((),3,[])]

EDIT in order to answer comments

There are several things about the suggestiosn in the comments that confuses me. Therefore I find it necessary to edit the post rather than just using the comment section to respond.

Furthermore I shall here try to directionalize the problem-solving in order to respond to especially the comments made with regards to the similarity of the current problem to the post How to use `runAccumT` , this also to open up for the possibility of the correct answer/solution being some answer/solution that might not be how I initially thought it to be, and hence why I shall also be open to that this answer/solution might not produce the output that I orginially expected.

As I understand the comments (and this might be a falacious understanding since I am still quite new at Haskell), they highlight and claim that my line newtype RegModule d a = RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]} is the cause of the error, since it doesn't take into account an accumulated value, but just a single Int, when running the Monad RegModule. Therefore if the code above is to succeed I am to change it into something like newtype RegModule d a = RegModule {runRegModule :: String -> Int -> d -> [(a, Sum Int, d)]} instead since the input value isn't the Sum Int but rather just an Int which cannot accumulate values because it isn't a Monoidically value.

Some concerns about the comments at stake:

The problem I posted above including the Monad and the runner monad are code that I have from an assignment text. It should be correct. However it is supposed to be code that should make a basis for several code parts. That is, a regex parser and a Matcher (Interpreter). This code is not supposed to be changed. I am however not that skilled at Haskell so I thought it would be relevant to have some 'printable' or visible code that would show some output while I was building the remaining code. That is why I suggested the runRegModuleThrice to manually get a feeling for how values accumulated (e. g. while the parser ran over an input string).

It can be the case that when trying to make it possible to print the output, that I am somehow forcing the coding in a way that is deviating from what was inteded with the original assignment code for how a parser and matcher is supposed to be combined.

I am however not sure what to expect for a parser and whether or not the regEX module provided along with the other parts are supposed to give some list akin to the above mentioned expected outputs or perhaps something more similar to [((),1,['a']),((),2,['a']),((),3,['a'])], or not. Although I suspect they are.

Some follow up questions:

  1. If it is the case that I interpreted it correct with regards to the above about the lack of a Monoidically value, isn't much of the idea behind Monads lost including the ones that were depicted in the LYAH: http://learnyouahaskell.com/for-a-few-monads-more about the Writer Monad.

  2. Despite creating changes to the runner monad are there ways to show an accumulated value or 'appended text' as the output?

  3. Isn't it likely that there might be some way to archieve the outcome [((),1,['a']),((),2,['a']),((),3,['a'])] (or an even more 'correct' variant), by solely changing the regEx, scanChar, ``regfailand/orrunRegModuleThrice? Th is could perhaps be archieved by usingdo-notation for the runRegModuleThrice`, as I understand it the do notation is used for unpacking the value of a Monad and possibly using it in the next. However this doesn't seem to work.

  4. In general from examples I have encountered in LYAH it hasn't been stressed that the runner should be changed. I am a bit confused to why that is, and if it is something in a later release of Haskell.


Solution

  • I'll respectfully disagree with the other answer here. The underlying problem is that you've misinterpreted the output Int in the tuple (a, Int, d). For this monad, the input Int is the offset into the string, but the output Int is the number of characters scanned, not the new offset. This means that -- using the original code posted in the question -- your actual output is correct:

    ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
    [((),1,[]),((),1,[]),((),1,[])]
    

    because those output Ints are the number of characters scanned, not the new offsets. Since the matcher always matches a single character, if it matches anything at all, each of the three runs of the matcher should return 1, as in 1 character scanned.

    However, your code is still buggy, because if you try a test case with the input string "aa", it still matches:

    ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
    [((),1,[]),((),1,[]),((),1,[])]
    

    The problem is that, in runRegModuleThrice, you are reimplementing the bind (>>=) operation for the monad but your implementation is incompatible with the actually "meaning" of the input and output Ints in the monad.

    As I noted above, the definition for >>= in the monad instance is set up to treat the input Int as an offset into the string but the output Int as the number of characters scanned, and when the bind chains the left-hand-side to the right-hand-side, it adds this output Int (number of characters scanned) to the original input Int (starting offset) to get the new input Int (next offset). It also returns the sum of the two output Ints -- the number of characters scanned by the LHS plus the number of characters scanned by the RHS. If the output Ints were offsets, it would just return the final offset j' directly, not try to add it to another offset:

    m >>= f =
      RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
      --            ^             ^                          ^
      --            |             |                          |
      --            |   output count of characters scanned   |
      --            |                                        |
      --            `-- starting offset as input ------------'
    
                              (b, j', d'') <- runRegModule (f a) s (i + j) d'
      -- new offset is *sum* of starting offset and number scanned -^^^^^
    
                              return (b, j + j', d''))
      --                                 ^^^^^^
      --        returned value is total number of characters scanned, j from
      --        the first, and j' from the second, **NOT** the new offset
    

    Assuming the monad instance is correct, your original scanChar implementation was also correct. When it succeeds, it scans a single character, so it should always return 1 on success.

    The bug is in your implementation of runRegModuleThrice which is misinterpreting the output Int as an offset. If you modify it to treat the input Ints as offsets and the output Ints as counts of characters scanned:

    runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
    runRegModuleThrice matcher input startPos state =
      let (result1, cnt1, newState1) = head $ runRegModule matcher input startPos state
          (result2, cnt2, newState2) = head $ runRegModule matcher input (startPos + cnt1) newState1
          (result3, cnt3, newState3) = head $ runRegModule matcher input (startPos + cnt1 + cnt2) newState2
      in [(result1, cnt1, newState1), (result2, cnt2, newState2), (result3, cnt3, newState3)]
    

    then it should work correctly. Note that it STILL will return the output you claimed was wrong:

    ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
    [((),1,[]),((),1,[]),((),1,[])]
    

    but this output is actually correct, as explained above.

    It'll crash on an input of "aa":

    λ> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
    [((),1,[]),((),1,[]),(*** Exception: Prelude.head: empty list
    CallStack (from HasCallStack):
      error, called at libraries/base/GHC/List.hs:1646:3 in base:GHC.List
      errorEmptyList, called at libraries/base/GHC/List.hs:85:11 in base:GHC.List
      badHead, called at libraries/base/GHC/List.hs:81:28 in base:GHC.List
      head, called at Collect.hs:54:36 in main:Collect
    

    but this is correct, too. Your runRegModuleThrice is written to assume that each run of the matcher succeeds and returns at least one match. When a run fails (like the third match failing here), it tries to take the head of an empty list and crashes, as designed.

    I think what you were SUPPOSED to do here was implement runRegModuleThrice using the monad's implementation of >>=, not reimplement it from scratch, so maybe something more like:

    runRegModuleThrice' :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
    runRegModuleThrice' matcher = runRegModule (matcher >> matcher >> matcher)
    

    which uses the monad bind (i.e., the definition of >>=) via the >> operator.

    This does not return a "history" of the three matches. Instead, it returns the result of matching on the composite monadic match. This should return the a value of the final match, which is just () for your test case. It should also return the total number of character matched by the three matchers that are part of the composite match, which is the number 3, i.e., three matches of 1 character each.

    And, in fact, it does just that:

    λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
    [((),3,[])]
    

    It also works correctly on strings that don't match without crashing:

    λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
    []
    λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "a" 0 []
    []
    λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "bbb" 0 []
    []