Search code examples
parsinghaskellsubroutinecoroutine

What is an elegant way to drop a subroutine into a parser?


Problem

I am rolling a nondeterministic parser for self-education, it looks like:

newtype Parser a b = P { runP :: [a] -> [(b, [a])] }

And I want to be able to drop in a subroutine that might be of form: [a] -> [b], which takes a buffer of chars and sends it to a list of results. The trick here is that the subroutine itself is a stateful computation, and it transitions through states upon each successful invocation (think of it as a finite state machine). Specifically:

  1. If the subroutine outputs the empty list [], the parser inserts one more char into the buffer and yields it to the subroutine, which runs again.
  2. If the subroutine outputs non-empty list [b], the buffer is first flushed and the parser inserts one more char into the buffer, yielding it to subroutine. The [b] is stored somewhere
  3. Until an escape condition is reached, steps 1 and 2 are run over and over again. All intermediate results should be combined in some way.
  4. Once the escape condition is reached, the subroutine yields the results bs back to the parser, and combines it with the remaining stream of as like so:

    rs = fmap (flip (,) as) bs :: [(b,[a])]

thus satisfiying the signature of runP

The function might have this signature: withf :: ([a] -> [b]) -> Parser a b

The important thing is that withf g has to be a parser, so I can build larger parsers using <*>. Note the function signature suggest g is a pure function, so it's unlikely to be correct.

Tried Solutions

I tried to implement this using various coroutine packages, but to me it makes more sense to lift a parser into a coroutine computation context, composing it with a transducer which is also lifted into context, which means it's no longer a parser.

I also tried to implement withf as a primitive function that would have access to the Parser value constructor. Basically translating steps 1..4 into code. The biggest problem I have here is who is in charge of what information:

  1. state of buffer
  2. state of intermediate results
  3. the logic of how intermediate results are to be combined.
  4. how should escape condition be implemented, or better yet composed into withf

I also tried various home-brew coroutine implementations baked right into the parser (so not using Parser type defined above), with little success.

Anyone who could point me in the right direction is greatly appreciated.


Solution

  • First let's use MonadPlus instead of [] in the parser. It will make it more generic and clarify the code a bit (we won't have so many nested []s):

    newtype Parser a m b = P { runP :: [a] -> m (b, [a]) }
    

    I suggest you to change the signature of your subroutines. What you need is to:

    • signal if a subroutine needs more input or not, and
    • keep the state somewhere.

    This can be easily accomplished by this type signature:

    newtype Sub a b = Sub { runSub :: Either (a -> Sub a b) [b] }
    

    A subroutine either yields a result, or requests a new input and produces a new subroutine. This way, you can keep whatever state you need by passing it into the returned subroutine. The conversion function will then look like:

    withf :: (MonadPlus m) => Sub a b -> Parser a m b
    withf start = P $ f (runSub start)
      where
        f (Right bs) xs   = msum [ return (b, xs) | b <- bs ]
        f (Left r)   []   = mzero    -- No more input, can't proceed.
        f (Left r) (x:xs) = f (runSub (r x)) xs
    

    Update: Another approach we could take is to realize that the parser is actually a StateT transformer, whose state is [a]:

    type Parser a m b = StateT [a] m b
    
    runP :: (Monad m) => Parser a m b -> [a] -> m (b, [a])
    runP = runStateT
    

    Indeed, runP is exactly runStateT!

    This way, we get a Monad instance for Parser for free. And now we can split our task into smaller blocks. First, we create a parser that consumes one input, or fails:

    receive :: (MonadPlus m) => Parser a m a
    receive = get >>= f
      where
        f []        = mzero    -- No more input, can't proceed.
        f (x:xs)    = put xs >> return x
    

    and then use it to describe withf:

    withf :: (MonadPlus m) => Sub a b -> Parser a m b
    withf start = f (runSub start)
      where
        f (Right bs) = msum (map return bs)
        f (Left r)   = receive >>= f . runSub . r
    

    Notice that if m is a MonadPlus then also StateT s m is a MonadPlus, so we can use directly mzero and msum with Parser.