parsing haskell recursion functional-programming stack-overflow

Stack overflow with two functions calling each other in Applicative parser

I'm doing data61's course: https://github.com/data61/fp-course. In the parser one, the following implementation will cause parse (list1 (character *> valueParser 'v')) "abc" stack overflow.

Existing code:

data List t =
  Nil
  | t :. List t
  deriving (Eq, Ord)

-- Right-associative
infixr 5 :.

type Input = Chars

data ParseResult a =
    UnexpectedEof
  | ExpectedEof Input
  | UnexpectedChar Char
  | UnexpectedString Chars
  | Result Input a
  deriving Eq

instance Show a => Show (ParseResult a) where
  show UnexpectedEof =
    "Unexpected end of stream"
  show (ExpectedEof i) =
    stringconcat ["Expected end of stream, but got >", show i, "<"]
  show (UnexpectedChar c) =
    stringconcat ["Unexpected character: ", show [c]]
  show (UnexpectedString s) =
    stringconcat ["Unexpected string: ", show s]
  show (Result i a) =
    stringconcat ["Result >", hlist i, "< ", show a]

instance Functor ParseResult where
  _ <$> UnexpectedEof =
    UnexpectedEof
  _ <$> ExpectedEof i =
    ExpectedEof i
  _ <$> UnexpectedChar c =
    UnexpectedChar c
  _ <$> UnexpectedString s =
    UnexpectedString s
  f <$> Result i a =
    Result i (f a)

-- Function to determine is a parse result is an error.
isErrorResult ::
  ParseResult a
  -> Bool
isErrorResult (Result _ _) =
  False
isErrorResult UnexpectedEof =
  True
isErrorResult (ExpectedEof _) =
  True
isErrorResult (UnexpectedChar _) =
  True
isErrorResult (UnexpectedString _) =
  True

-- | Runs the given function on a successful parse result. Otherwise return the same failing parse result.
onResult ::
  ParseResult a
  -> (Input -> a -> ParseResult b)
  -> ParseResult b
onResult UnexpectedEof _ = 
  UnexpectedEof
onResult (ExpectedEof i) _ = 
  ExpectedEof i
onResult (UnexpectedChar c) _ = 
  UnexpectedChar c
onResult (UnexpectedString s)  _ = 
  UnexpectedString s
onResult (Result i a) k = 
  k i a

data Parser a = P (Input -> ParseResult a)

parse ::
  Parser a
  -> Input
  -> ParseResult a
parse (P p) =
  p

-- | Produces a parser that always fails with @UnexpectedChar@ using the given character.
unexpectedCharParser ::
  Char
  -> Parser a
unexpectedCharParser c =
  P (\_ -> UnexpectedChar c)

--- | Return a parser that always returns the given parse result.
---
--- >>> isErrorResult (parse (constantParser UnexpectedEof) "abc")
--- True
constantParser ::
  ParseResult a
  -> Parser a
constantParser =
  P . const

-- | Return a parser that succeeds with a character off the input or fails with an error if the input is empty.
--
-- >>> parse character "abc"
-- Result >bc< 'a'
--
-- >>> isErrorResult (parse character "")
-- True
character ::
  Parser Char
character = P p
  where p Nil = UnexpectedString Nil
        p (a :. as) = Result as a

-- | Parsers can map.
-- Write a Functor instance for a @Parser@.
--
-- >>> parse (toUpper <$> character) "amz"
-- Result >mz< 'A'
instance Functor Parser where
  (<$>) ::
    (a -> b)
    -> Parser a
    -> Parser b
  f <$> P p = P p'
    where p' input = f <$> p input 

-- | Return a parser that always succeeds with the given value and consumes no input.
--
-- >>> parse (valueParser 3) "abc"
-- Result >abc< 3
valueParser ::
  a
  -> Parser a
valueParser a = P p
  where p input = Result input a

-- | Return a parser that tries the first parser for a successful value.
--
--   * If the first parser succeeds then use this parser.
--
--   * If the first parser fails, try the second parser.
--
-- >>> parse (character ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (character ||| valueParser 'v') "abc"
-- Result >bc< 'a'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') "abc"
-- Result >abc< 'v'
(|||) ::
  Parser a
  -> Parser a
  -> Parser a
P a ||| P b = P c
  where c input
          | isErrorResult resultA = b input
          | otherwise = resultA
            where resultA = a input

infixl 3 |||

My code:

instance Monad Parser where
  (=<<) ::
    (a -> Parser b)
    -> Parser a
    -> Parser b
  f =<< P a = P p
    where p input = onResult (a input) (\i r -> parse (f r) i)

instance Applicative Parser where
  (<*>) ::
    Parser (a -> b)
    -> Parser a
    -> Parser b
  P f <*> P a = P b
    where b input = onResult (f input) (\i f' -> f' <$> a i)

list ::
  Parser a
  -> Parser (List a)
list p = list1 p ||| pure Nil

list1 ::
  Parser a
  -> Parser (List a)
list1 p = (:.) <$> p <*> list p

However, if I change list to not use list1, or use =<< in list1, it works fine. It also works if <*> uses =<<. I feel like it might be an issue with tail recursion.

UPDATE:

If I use lazy pattern matching here

  P f <*> ~(P a) = P b
    where b input = onResult (f input) (\i f' -> f' <$> a i)

It works fine. Pattern matching here is the problem. I don't understand this... Please help!

Solution

If I use lazy pattern matching P f <*> ~(P a) = ... then it works fine. Why?

This very issue was discussed recently. You could also fix it by using newtype instead of data: newtype Parser a = P (Input -> ParseResult a).(*)

The definition of list1 wants to know both parser arguments to <*>, but actually when the first will fail (when input is exhausted) we don't need to know the second! But since we force it, it will force its second argument, and that one will force its second parser, ad infinitum.(**) That is, p will fail when input is exhausted, but we have list1 p = (:.) <$> p <*> list p which forces list p even though it won't run when the preceding p fails. That's the reason for the infinite looping, and why your fix with the lazy pattern works.

What is the difference between data and newtype in terms of laziness?

(*)newtype'd type always has only one data constructor, and pattern matching on it does not actually force the value, so it is implicitly like a lazy pattern. Try newtype P = P Int, let foo (P i) = 42 in foo undefined and see that it works.

(**) This happens when the parser is still prepared, composed; before the combined, composed parser even gets to run on the actual input. This means there's yet another, third way to fix the problem: define

list1 p = (:.) <$> p <*> P (\s -> parse (list p) s)

This should work regardless of the laziness of <*> and whether data or newtype was used.

Intriguingly, the above definition means that the parser will be actually created during run time, depending on the input, which is the defining characteristic of Monad, not Applicative which is supposed to be known statically, in advance. But the difference here is that the Applicative depends on the hidden state of input, and not on the "returned" value.