I am trying to understand the pMany and pMany1 functions in the parser combinator
newtype Parser s t = P([s] -> [(t, [s])])
pMany, pMany1 :: Parser s a → Parser s [a]
pMany p =(:) <$> p <*> pMany p `opt` []
pMany1 p = (:) <$> p <*> pMany p
As I understand, pMany runs the parser as many times as possible and collects the final result in [a]. What I don't understand is how it keeps track of the results between each run. The applicative combination is context-free and should not remember the state in between. Is that correct?
Thanks a lot!
Let's break it down;
pMany p = (:) <$> p <*> pMany p `opt` []
basically means
pMany p = (fmap (:) p <*> pMany p) `opt` []
This expression consists of two parts:
fmap (:) p <*> pMany p
The idea here is to try to parse "an element and try to parse more" or "don't parse anything" if the previous step didn't succeed. I assume the second part is understandable, let's focus on the first.
What we will need here is to understand how do the fmap
and <*>
work:
fmap
is pretty simple: it takes a function a -> b
, a parser Parser s a
and returns Parser s b
. This allows us to explicitly manipulate the results of the parser without actually running it.<*>
works exactly the same as fmap
with a little difference that the function is itself a result of a parsing. In (subjectively) most sane implementation it:
So what happens in this mysterious fmap (:) p <*> pMany p
:
a
using parser p
.(:) :: a -> [a] -> [a]
to it. So if we have parsed, let's say, an int 2137
, we now have (:) 2137
which is the same as \rest -> 2137:rest
. At this point we have parser of type Parser s ([a] -> [a])
.<*>
operator which is recursive call to pMany
. We can read this as "proceed with the same algorithm". Effectively, we parse the rest of the elements. This yields (according to the type of pMany
) Parser s [a]
.p
) to the following ones (parsed with pMany p
). From the type of <*>
we can infer that the resulting type will be Parser s [a]
as expected.This code is semantically equivalent to
pMany p = (do
someElem <- p
restElems <- pMany p
return (someElem : restElems)
) `opt` []
pMany1
does the same trick, but fails if it can't parse the first element. Note that it calls pMany
after that which doesn't have this property. As a result we force it to parse at least one thing ("parse one and then parse any number").