Search code examples
parsinghaskellcombinators

Can not understand 'pMany' and 'pMany1' in the parser combinator


I am trying to understand the pMany and pMany1 functions in the parser combinator

newtype Parser s t = P([s] -> [(t, [s])])
pMany, pMany1 :: Parser s a → Parser s [a] 
pMany p =(:) <$> p <*> pMany p `opt` [] 
pMany1 p = (:) <$> p <*> pMany p

As I understand, pMany runs the parser as many times as possible and collects the final result in [a]. What I don't understand is how it keeps track of the results between each run. The applicative combination is context-free and should not remember the state in between. Is that correct?

Thanks a lot!


Solution

  • Let's break it down;

    pMany p = (:) <$> p <*> pMany p `opt` [] 
    

    basically means

    pMany p = (fmap (:) p <*> pMany p) `opt` [] 
    

    This expression consists of two parts:

    • Parse fmap (:) p <*> pMany p
    • If above fails, be okay with empty list as result

    The idea here is to try to parse "an element and try to parse more" or "don't parse anything" if the previous step didn't succeed. I assume the second part is understandable, let's focus on the first.

    What we will need here is to understand how do the fmap and <*> work:

    • fmap is pretty simple: it takes a function a -> b, a parser Parser s a and returns Parser s b. This allows us to explicitly manipulate the results of the parser without actually running it.
    • <*> works exactly the same as fmap with a little difference that the function is itself a result of a parsing. In (subjectively) most sane implementation it:
      • Runs the left side parser (consuming input) that returns a function
      • Runs the right side parser (on the remaining input) that returns an argument
      • Combines the above into a single parser that returns function applied to mentioned argument

    So what happens in this mysterious fmap (:) p <*> pMany p:

    • First we parse some object of type a using parser p.
    • Then, inside of the parsing context, we apply function (:) :: a -> [a] -> [a] to it. So if we have parsed, let's say, an int 2137, we now have (:) 2137 which is the same as \rest -> 2137:rest. At this point we have parser of type Parser s ([a] -> [a]).
    • Next step is to parse the right side of the <*> operator which is recursive call to pMany. We can read this as "proceed with the same algorithm". Effectively, we parse the rest of the elements. This yields (according to the type of pMany) Parser s [a].
    • Finally, we apply the previous result over the last one getting a parser that attaches the left element (parsed with p) to the following ones (parsed with pMany p). From the type of <*> we can infer that the resulting type will be Parser s [a] as expected.

    This code is semantically equivalent to

    pMany p = (do
        someElem <- p
        restElems <- pMany p
        return (someElem : restElems)
      ) `opt` []
    

    pMany1 does the same trick, but fails if it can't parse the first element. Note that it calls pMany after that which doesn't have this property. As a result we force it to parse at least one thing ("parse one and then parse any number").