While reading Haskell book I came across trifecta
I'm trying to wrap my head around but still not able to understand <|>
I have following questions.
in simple words (<|>) = Monadic Choose ?
p = a <|> b -- use parser a if not then use b ?
if yes then why following parser is failing ?
parseFraction :: Parser Rational
parseFraction = do
numerator <- decimal
char '/'
denominator <- decimal
case denominator of
0 -> fail "denominator cannot be zero"
_ -> return (numerator % denominator)
type RationalOrDecimal = Either Rational Integer
parseRationalOrDecimal = (Left <$> parseFraction) <|> (Right<$> decimal)
main = do
let p f i = parseString f mempty i
print $ p (some (skipMany (oneOf "\n") *> parseRationalOrDecimal <* skipMany (oneOf "\n"))) "10"
in perfect world if a is parseFraction is going to fail then <|>
should go with decimal but this is not the case.
but when I use try it works.
parseRationalOrDecimal = try (Left <$> parseFraction) <|> (Right<$> decimal)
The reason is beacuse parseFraction
consumes input before failing therefore, it is considered to be the correct branch in the choice. Let me give you and example:
Let say you are writing a python parser and you have to decide if a declaration is a class
or a function (keyword def
), then you write
parseExpresion = word "def" <|> word "class" -- DISCLAIMER: using a ficticious library
Then if the user writes def
or class
it will match, but if the user writes det
It will try the first branch and match de
and then fail to match expected f
because t
was found. It will not bother to try the next parser, because the error is considered to be in the first branch. It'd make little sense to try the class
parser since likely, the error is in the first branch.
In your case parseFraction
matches some digits and then fails because /
isn't found, and then it doesn't bother to try decimal
parser.
This is a desing decision, some other libraries use a different convention (ex: Attoparsec
always backtrack on failure), and some functions claim to "not consume input" (ex: notFollowedBy
)
Notice that there is a trade-off here:
First: If <|>
behaves as you expect the following
parse parseRationalOrDecimal "123456789A"
will first parse all numbers until "A" is found and then it will parse again! all numbers until "A" is found... so doing the same computation twice just to return a failure.
Second: If you care more about error messages the current behaviour is more convinient. Following the python example, imagine:
parseExpresion = word "def" <|> word "class" <|> word "import" <|> word "type" <|> word "from"
If the user types "frmo" the, the parser will go to the last branch and will raise and error like expected "from" but "frmo" was found
Whereas, if all alternatives must be checked the error would be something more like expected one of "def", "class", "import", "type" of "from"
which is less close to the actual typo.
As I said, it is a library desing decision, I am just trying to convince you that there are good reasons to not try all alternatives automatically, and use try
if you explicitly want to do so.