I want to use RPN interpreter to practice the use of Parsec, and here is my code:
module RPN where
import Control.Applicative hiding((<|>))
import Text.ParserCombinators.Parsec
data RPNVal = Add|Sub|Mul|Div|Num Double deriving(Show)
a <:> b = (:) <$> a <*> b
a <++> b = (++) <$> a <*> b
opSymbol = oneOf "+-*/"
number = many1 digit :: Parser String
plus = char '+' *> number
minus = char '-' <:> number
integer = (number <|> minus <|> plus) :: Parser String
float = integer <++> decimal <++> exponent
where decimal = option "" $ char '.' <:> number
exponent = option "" $ oneOf "eE" <:> number
parseOp :: Parser RPNVal
parseOp = do
op <- opSymbol
case op of
'+' -> return Add
'-' -> return Sub
'*' -> return Mul
'/' -> return Div
parseNum :: Parser RPNVal
parseNum = liftA (read::String->Double) float >>= return.Num
parseRPNVal :: Parser RPNVal
-- parseRPNVal = parseOp <|> parseNum
-- parseRPNVal = parseNum <|> parseOp
parseRPNVal = try parseNum <|> parseOp
parseExpr :: Parser [RPNVal]
parseExpr = parseRPNVal `sepBy1` spaces
readExpr :: String->[RPNVal]
readExpr input = case parse parseExpr "RPN" input of
Left errMsg -> error $ show errMsg
Right val -> val
in the definition of parseRPNVal
, I found that if I use parseRPNVal = parseOp <|> parseNum
, then the parser worked well, however, if I use parseRPNVal = parseNum <|> parseOp
, then the function could only parse numbers while operators would cause error.
In Real World Haskell, it says:
This operator behaves like this: it will first try the parser on the left. If it consumed no input[35], it will try the parser on the right
so could anyone please explain the behavior of parseOp
and parseNum
to me?
It's just as Real World Haskell says.
parseNum
first parses an integer
, but integer
-s can be prefixed with +
or -
.
Therefore, when parsing parseNum <|> parseOp
, if the next character is +
or -
, and what we actually have is an operator, parseNum
reads in that character and fails only afterwards.
With parseOp <|> parseNum
, no backtracking is needed, of course, since parseOp
needs just one character lookahead.