Search code examples
haskellfunctional-programming

How to properly step through recursive parsing?


The code I have is this:

-- my parser, takes a function that takes a string and gives 
-- the suffix and the answer
data Parser a = MkParser (String -> Maybe (String, a))

unParser :: Parser a -> String -> Maybe (String, a)
unParser (MkParser sf1) = sf1

-- gives `a` as the answer, never fails, doesn't change input string
pure :: a -> Parser a
pure a = MkParser (\inp -> Just (inp, a))

-- Choice... if pa fails, try pa2
(<|>) :: Parser a -> Parser a -> Parser a
MkParser sf1 <|> p2 = MkParser g
  where
    g inp = case sf1 inp of
              Nothing -> unParser p2 inp
              j -> j        -- the Just case

-- | 0 or more times, collect the answers into a list.
many :: Parser a -> Parser [a]
many p = some p <|> pure []
-- Explanation: To repeat 0 or more times, first try 1 or more
-- times!  If that fails, then we know it's 0 times, and the answer is the
-- empty list.

-- | 1 or more times, collect the answers into a list.
some :: Parser a -> Parser [a]
some p = liftA2 (:) p (many p)
-- Explanation: To repeat 1 or more times, do 1 time, then 0 or more times!

liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftA2 op (MkParser sf1) p2 = MkParser g
  where
    g inp = case sf1 inp of
              Nothing -> Nothing
              Just (middle, a) ->
                  case unParser p2 middle of
                    Nothing -> Nothing
                    Just (rest, b) -> Just (rest, op a b)

and I test the code with this basic parser that simply gives success if it's the char I am looking for:

char :: Char -> Parser Char
char wanted = MkParser sf
  where
    sf (c:cs) | c == wanted = Just (cs, c)
    sf _ = Nothing

To run parsers, I use this

runParser :: Parser a -> String -> Maybe a
runParser (MkParser sf) inp = case sf inp of
                                Nothing -> Nothing
                                Just (_, a) -> Just a

Now I do runParser (many (char '!')) "testing" and I get Just "".

I'm trying to step through the code. First, (many (char '!')) calls some (char '!').

This calls liftA2 (:) p (many p). And here, many p calls liftA2 (:) p (many p) again. This repeats infinitely right? I'm new to Haskell, but I'm guessing because of it's lazy evaluation, it eventually tries to evaluate p?. runParser (char '!') "testing" gives Nothing therefore the first liftA2 (:) p (many p) gives Nothing as well right? This moves to the second case of some p <|> pure [] and pure [] gives Just []. So the answer should be Just [].

But the answer I get when I run runParser (many (char '!')) "Testing" is Just "". Where did the "" come from?

Related: Are these steps correct?

many (char '!') = some (char '!') <|> pure []
-- calls some p
some (char '!') = liftA2 (:) (char '!') (many (char '!'))
-- here, it ways for the third argument of liftA2, or does it
-- start trying to run liftA2 with the second argument?
-- (many (char '!')) calls this:
many (char '!') = some (char '!') <|> pure []
-- which then again calls 
some (char '!') = liftA2 (:) (char '!') (many (char '!'))
-- so now it looks like this:
liftA2 (:) (char '!') ( liftA2 (:) (char '!') (many (char '!')) )
-- When does it start evaluating? 
-- Because `(many (char '!')` can't technically run because
-- it needs to wait for the third argument of 
liftA2 (:) (char '!') (many (char '!'))
-- which never gives an answer since it keeps recursing right?

If I had to take a guess:

many (char '!') = some (char '!') <|> pure []
some (char '!') = liftA2 (:) (char '!') (many (char '!'))
-- run `liftA2 (:) (char '!')`, get `Nothing`
-- go to the second choice of <|>
<|> pure []
-- give Just []
Just []

Is this right?

1 more Edit: (Sorry I should have added these questions in the original post but since they are all related I don't want to create another question)

If I do

runParser (many (char '!')) "!esting"

The steps are:

(many (char '!'))
many p = some p <|> pure []
-- calls `some p`
some (char '!') = liftA2 (:) (char '!') (many (char '!'))
-- the first `char !` (second argument) runs, and it is a success.
-- the success gives Just ("esting", '!') to the third
-- argument of liftA2.
-- Now it goes to the third argument of liftA2, which is
(many (char '!')
-- which calls
many p = some p <|> pure [] -- but the string is "esting"
-- this calls
some (char '!') = liftA2 (:) (char '!') (many (char '!'))
-- the string is "esting"
-- this fails, so it tries the second choice of
some p <|> pure []
-- which is `pure []` which just gives []

-- Now we go back to the first
liftA2 (:) (char '!') (many (char '!'))
-- where the first answer is '!' and the second answer is '[]'
-- and it combines it with `:`. So we have
-- Just ("sting", '!' : '[]')

Solution

  • Note that "" is the same as [].

    In Haskell the empty list in type [a] is printed as [], except when a = Char. For [Char], AKA String, it is printed as "".

    Non-empty lists of Char also have a special notation: instead of ['H', 'e', 'l', 'l', 'o'] we get "Hello" as output. Still, those two values are exactly the same.


    About your computation steps, the order of evaluation is tricky in Haskell since when calling f x y z it is f that decides what to evaluate among its arguments. It could evaluate x first, and then depending on the result evaluate y or z. Or, instead, it could start by evaluating z first...

    Anyway, fortunately the order does not matter that much. As long as you expand the definitions as you did, and as long as you don't get stuck in an infinite loop (or other bottom value), your steps are semantically correct. In the worst case, you end up evaluating more than needed, but the end result it's always the same.

    many (char '!') = some (char '!') <|> pure []
    -- calls some p
    

    Correct, since <|> decides to try the first parser first.

    some (char '!') = liftA2 (:) (char '!') (many (char '!'))
    -- here, it ways for the third argument of liftA2, or does it
    -- start trying to run liftA2 with the second argument?
    

    Roughly, liftA2 starts with the second argument. The third might get evaluated later depending on that.

    -- (many (char '!')) calls this:
    many (char '!') = some (char '!') <|> pure []
    -- which then again calls 
    some (char '!') = liftA2 (:) (char '!') (many (char '!'))
    -- so now it looks like this:
    liftA2 (:) (char '!') ( liftA2 (:) (char '!') (many (char '!')) )
    -- When does it start evaluating?
    

    This is correct, but we never really get to that. char '!' on your input string fails with Nothing, and seeing that liftA2 stops the computation with result Nothing.

    We now try the second side of .. <|> pure [], hence the final Just [], or rather Just "" as it is actually printed.

    (Note that this execution is what happens on your input string "Testing". On other strings we may get a different behavior, of course.)