Search code examples
haskellparsec

behaviors of the (<|>) operator of haskell parsec?


I want to use RPN interpreter to practice the use of Parsec, and here is my code:

module RPN where
import Control.Applicative hiding((<|>))
import Text.ParserCombinators.Parsec 
data RPNVal = Add|Sub|Mul|Div|Num Double deriving(Show)
a <:> b  = (:) <$> a <*> b
a <++> b = (++) <$> a <*> b
opSymbol = oneOf "+-*/"
number   = many1 digit :: Parser String
plus     = char '+' *> number
minus    = char '-' <:> number
integer  = (number <|> minus <|> plus) :: Parser String
float    = integer <++> decimal <++> exponent
    where decimal  = option "" $ char '.' <:> number
          exponent = option "" $ oneOf "eE" <:> number
parseOp :: Parser RPNVal
parseOp = do
    op <- opSymbol
    case op of 
         '+' -> return Add 
         '-' -> return Sub 
         '*' -> return Mul 
         '/' -> return Div
parseNum :: Parser RPNVal
parseNum = liftA (read::String->Double) float >>= return.Num
parseRPNVal :: Parser RPNVal 
-- parseRPNVal = parseOp <|> parseNum
-- parseRPNVal = parseNum <|> parseOp
parseRPNVal = try parseNum <|> parseOp
parseExpr :: Parser [RPNVal]
parseExpr = parseRPNVal `sepBy1` spaces
readExpr :: String->[RPNVal]
readExpr input = case parse parseExpr "RPN" input of
                      Left errMsg -> error $ show errMsg
                      Right val   -> val

in the definition of parseRPNVal, I found that if I use parseRPNVal = parseOp <|> parseNum, then the parser worked well, however, if I use parseRPNVal = parseNum <|> parseOp, then the function could only parse numbers while operators would cause error.

In Real World Haskell, it says:

This operator behaves like this: it will first try the parser on the left. If it consumed no input[35], it will try the parser on the right

so could anyone please explain the behavior of parseOp and parseNum to me?


Solution

  • It's just as Real World Haskell says.

    parseNum first parses an integer, but integer-s can be prefixed with + or -.

    Therefore, when parsing parseNum <|> parseOp, if the next character is + or -, and what we actually have is an operator, parseNum reads in that character and fails only afterwards.

    With parseOp <|> parseNum, no backtracking is needed, of course, since parseOp needs just one character lookahead.