Search code examples
haskelltrifecta

understanding trifecta parser <|> and try


While reading Haskell book I came across trifecta

I'm trying to wrap my head around but still not able to understand <|>

I have following questions.

in simple words (<|>) = Monadic Choose ?

p = a <|> b -- use parser a if not then use b ?

if yes then why following parser is failing ?

parseFraction :: Parser Rational
parseFraction = do
    numerator <- decimal
    char '/'
    denominator <- decimal
    case denominator of 
        0 -> fail "denominator cannot be zero"
        _ -> return (numerator % denominator)


type RationalOrDecimal = Either Rational Integer
parseRationalOrDecimal = (Left <$> parseFraction) <|> (Right<$> decimal)


main = do
    let p f i = parseString f mempty i
    print $ p (some (skipMany (oneOf "\n")  *> parseRationalOrDecimal <* skipMany (oneOf "\n"))) "10"

in perfect world if a is parseFraction is going to fail then <|> should go with decimal but this is not the case. but when I use try it works.

  1. what I'm missing ?
  2. why we need to use try when <|> should run second parser on first failure ?

parseRationalOrDecimal = try (Left <$> parseFraction) <|> (Right<$> decimal)


Solution

  • The reason is beacuse parseFraction consumes input before failing therefore, it is considered to be the correct branch in the choice. Let me give you and example:

    Let say you are writing a python parser and you have to decide if a declaration is a class or a function (keyword def), then you write

    parseExpresion = word "def" <|> word "class" -- DISCLAIMER: using a ficticious library
    

    Then if the user writes def or class it will match, but if the user writes det It will try the first branch and match de and then fail to match expected f because t was found. It will not bother to try the next parser, because the error is considered to be in the first branch. It'd make little sense to try the class parser since likely, the error is in the first branch.

    In your case parseFraction matches some digits and then fails because / isn't found, and then it doesn't bother to try decimal parser.

    This is a desing decision, some other libraries use a different convention (ex: Attoparsec always backtrack on failure), and some functions claim to "not consume input" (ex: notFollowedBy)

    Notice that there is a trade-off here:

    First: If <|> behaves as you expect the following

    parse parseRationalOrDecimal "123456789A"
    

    will first parse all numbers until "A" is found and then it will parse again! all numbers until "A" is found... so doing the same computation twice just to return a failure.

    Second: If you care more about error messages the current behaviour is more convinient. Following the python example, imagine:

    parseExpresion = word "def" <|> word "class" <|> word "import" <|> word "type" <|> word "from"
    

    If the user types "frmo" the, the parser will go to the last branch and will raise and error like expected "from" but "frmo" was found Whereas, if all alternatives must be checked the error would be something more like expected one of "def", "class", "import", "type" of "from" which is less close to the actual typo.

    As I said, it is a library desing decision, I am just trying to convince you that there are good reasons to not try all alternatives automatically, and use try if you explicitly want to do so.