Search code examples
haskellparsec

"Unexpected end of input - expecting end of input" in parsec


Consider this parsec parser (put in a file parsec-eof-test.hs):

import Text.Parsec
import Text.Parsec.String

main = do
  x <- parse (manyTill anyChar eof >> fail "forced fail") "" <$> readFile "parsec-eof-test.hs"
  print (x :: Either ParseError String)

If you run it, you get:

Left (line 7, column 1):
unexpected end of input
expecting end of input
forced fail

unexpected end of input - expecting end of input - that doesn't make any sense, it's a contradiction.

What's going on here?

Is it a bad default in parsec, or is what I'm looking at actually some stack of potential errors that parsec came by while parsing?

Since my parser manyTill anyChar eof consumes input, I'd expect the only error message to be emitted to be forced fail. What am I missing?


Solution

  • Works with Megaparsec as expected:

    module Main (main) where
    
    import Data.Void
    import Text.Megaparsec
    import Text.Megaparsec.Char
    
    type Parser = Parsec Void String
    
    main :: IO ()
    main = do
      parseTest' (manyTill anyChar eof >> fail "forced fail" :: Parser String)
        "somethingsomething"
    

    When I run it, I get:

    1:19:
      |
    1 | somethingsomething
      |                   ^
    forced fail
    

    The reason you get that error message with Parsec is that (if I remember correctly) Parsec internally uses a lot of questionable "conventions". One such convention is that "unexpected end of input" is represented by the absence of other unexpected items. This just asks for trouble, sure. When I started turning Parsec into Megaparsec, I was scared that this is the Haskell's "industrial strength" parsing library.

    Why you get no other unexpected components? Because fail, which actually causes your parser to fail (as you expect) does not generate any of them. Nor does it generate expected components, but those (in your case it's the "expecting end of input" part) are picked up and merged into your fail error message because of this feature:

    λ> parseTest (many (char 'a') *> many (char 'b') *> empty) ""
    parse error at (line 1, column 1):
    unexpected end of input
    expecting "a" or "b"
    

    Parsec is clever enough to figure out that "a" and "b" are still possible here. Makes sense in this case, but lets you down with fail.

    Error messages in Parsec are a crazy thing, it's absolutely insane if you start to read the code critically.

    P.S. I don't mean to be mean, but let's call a shovel a shovel.