Problem
I am rolling a nondeterministic parser for self-education, it looks like:
newtype Parser a b = P { runP :: [a] -> [(b, [a])] }
And I want to be able to drop in a subroutine that might be of form: [a] -> [b]
, which takes a buffer of chars and sends it to a list of results. The trick here is that the subroutine itself is a stateful computation, and it transitions through states upon each successful invocation (think of it as a finite state machine). Specifically:
[]
, the parser inserts one more char into the buffer and yields it to the subroutine, which runs again. [b]
, the buffer is first flushed and the parser inserts one more char into the buffer, yielding it to subroutine. The [b]
is stored somewhereOnce the escape condition is reached, the subroutine yields the results bs
back to the parser, and combines it with the remaining stream of as
like so:
rs = fmap (flip (,) as) bs :: [(b,[a])]
thus satisfiying the signature of runP
The function might have this signature: withf :: ([a] -> [b]) -> Parser a b
The important thing is that withf g
has to be a parser, so I can build larger parsers using <*>
. Note the function signature suggest g
is a pure function, so it's unlikely to be correct.
Tried Solutions
I tried to implement this using various coroutine packages, but to me it makes more sense to lift
a parser into a coroutine computation context, composing it with a transducer which is also lifted into context, which means it's no longer a parser.
I also tried to implement withf
as a primitive function that would have access to the Parser value constructor. Basically translating steps 1..4 into code. The biggest problem I have here is who is in charge of what information:
withf
I also tried various home-brew coroutine implementations baked right into the parser (so not using Parser
type defined above), with little success.
Anyone who could point me in the right direction is greatly appreciated.
First let's use MonadPlus
instead of []
in the parser. It will make it more generic and clarify the code a bit (we won't have so many nested []
s):
newtype Parser a m b = P { runP :: [a] -> m (b, [a]) }
I suggest you to change the signature of your subroutines. What you need is to:
This can be easily accomplished by this type signature:
newtype Sub a b = Sub { runSub :: Either (a -> Sub a b) [b] }
A subroutine either yields a result, or requests a new input and produces a new subroutine. This way, you can keep whatever state you need by passing it into the returned subroutine. The conversion function will then look like:
withf :: (MonadPlus m) => Sub a b -> Parser a m b
withf start = P $ f (runSub start)
where
f (Right bs) xs = msum [ return (b, xs) | b <- bs ]
f (Left r) [] = mzero -- No more input, can't proceed.
f (Left r) (x:xs) = f (runSub (r x)) xs
Update: Another approach we could take is to realize that the parser is actually a StateT
transformer, whose state is [a]
:
type Parser a m b = StateT [a] m b
runP :: (Monad m) => Parser a m b -> [a] -> m (b, [a])
runP = runStateT
Indeed, runP
is exactly runStateT
!
This way, we get a Monad
instance for Parser
for free. And now we can split our task into smaller blocks. First, we create a parser that consumes one input, or fails:
receive :: (MonadPlus m) => Parser a m a
receive = get >>= f
where
f [] = mzero -- No more input, can't proceed.
f (x:xs) = put xs >> return x
and then use it to describe withf
:
withf :: (MonadPlus m) => Sub a b -> Parser a m b
withf start = f (runSub start)
where
f (Right bs) = msum (map return bs)
f (Left r) = receive >>= f . runSub . r
Notice that if m
is a MonadPlus
then also StateT s m
is a MonadPlus
, so we can use directly mzero
and msum
with Parser
.