Search code examples
parsinghaskellparsec

Parsec <|> choice in parser, Error throws but does not go to next parser


I am learning haskell with Write yourself a scheme.

I am currently trying to implement a char recognition in scheme. A char is #\<character> or #\<character-name> like #\a or #\ or #\space.

So i wrote the following code :

-- .. some code ..
data LispVal = Atom String
             | List [LispVal]
             | DottedList [LispVal] LispVal
             | String String
             | Number Integer
             | Bool Bool
             | Char Char deriving Show
-- .... More code ...
parseChar :: Parser LispVal
parseChar = liftM Char (parseSingleChar <|> parseSpecialCharNotation)

parseSingleChar :: Parser Char
parseSingleChar = do string "#\\"
                     x <- letter
                     return x

parseSpecialCharNotation :: Parser Char
parseSpecialCharNotation = do string "#\\"
                              x <- (parseSpace <|> parseNewline)
                              return x

parseSpace :: Parser Char
parseSpace = do char 's'
                char 'p'
                char 'a'
                char 'c'
                char 'e'
                return ' '

parseNewline :: Parser Char
parseNewline = do char 'n'
                  char 'e'
                  char 'w'
                  char 'l'
                  char 'i'
                  char 'n'
                  char 'e'
                  return '\n'

-- .. some more code...

readExpr :: String -> String
readExpr input = case parse parseExpr "lisp" input of
                 Left err -> "Parse Error: " ++ show err
                 Right val -> "Found value: " ++ show val

At this moment, i did not know about the string parser in Parsec.

The problem is that i recognizes, #\a but #\space is treated as a s.

*Main> readExpr "#\\space"
"Found value: Char 's'"

To resolve this problem, i changed parseChar as

parseChar :: Parser LispVal
parseChar = liftM Char (parseSpecialCharNotation <|> parseSingleChar)

but earlier problem is solved, but now it gives me errors with normal characters as -

*Main> readExpr "#\\s"
"Parse Error: \"lisp\" (line 1, column 4):\nunexpected end of input\nexpecting \"p\""

Why is that happening ? Should not it had moved to parseSingleChar as parseSpecialCharNotation failed ?

Full code at: Gist


Solution

  • From the documentation for <|>:

    The parser is called predictive since q is only tried when parser p didn't consume any input (i.e.. the look ahead is 1).

    In your case both the parses consume "#\\" before failing, so the other alternative can't be evaluated. You can use try to ensure backtracking works as expected:

    The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs.

    Something like the next:

    try parseSpecialCharNotation <|> parseSingleChar
    

    Side note: is it better to extract "#\\" out of the parsers because otherwise you are doing the same work twice. Something like the next:

    do
      string "#\\"
      try parseSpecialCharNotation <|> parseSingleChar
    

    Also, you can use string combinator instead of a series of char parsers.