Search code examples
parsinghaskellparsec

Why does try not trigger backtracking in this example


I am trying to wrap my head around writing parser using parsec in Haskell, in particular how backtracking works.

Take the following simple parser:

import Text.Parsec

type Parser = Parsec String () String

parseConst :: Parser
parseConst =  do {
    x <- many digit;
    return $ read x
}

parseAdd :: Parser
parseAdd = do {
    l <- parseExp;
    char '+';
    r <- parseExp;
    return $ l <> "+" <> r
}

parseExp :: Parser
parseExp = try parseConst <|> parseAdd

pp :: Parser
pp = parseExp <* eof

test = parse pp "" "1+1"

test has value

Left (line 1, column 2):
unexpected '+'
expecting digit or end of input

In my mind this should succeed since I used the try combinator on parseConst in the definition of parseExp.

What am I missing? I am also interrested in pointers for how to debug this in my own, I tried using parserTraced which just allowed me to conclude that it indeed wasn't backtracking.

PS. I know this is an awful way to write an expression parser, but I'd like to understand why it doesn't work.


Solution

  • There are a lot of problems here.

    First, parseConst can never work right. The type says it must produce a String, so read :: String -> String. That particular Read instance requires the input be a quoted string, so being passed 0 or more digit characters to read is always going to result in a call to error if you try to evaluate the value it produces.

    Second, parseConst can succeed on matching zero characters. I think you probably wanted some instead of many. That will make it actually fail if it encounters input that doesn't start with a digit.

    Third, (<|>) doesn't do what you think. You might think that (a <* c) <|> (b <* c) is interchangeable with (a <|> b) <* c, but it isn't. There is no way to throw try in and make it the same, either. The problem is that (<|>) commits to whichever branch succeed, if one does. In (a <|> b) <* c, if a matches, there's no way later to backtrack and try b there. Doesn't matter how you lob try around, it can't undo the fact that (<|>) committed to a. In contrast, (a <* c) <|> (b <* c) doesn't commit until both a and c or b and c match the input.

    This is the situation you're encountering. You have (try parseConst <|> parseAdd) <* eof, after a bit of inlining. Since parseConst will always succeed (see the second issue), parseAdd will never get tried, even if the eof fails. So after parseConst consumes zero or more leading digits, the parse will fail unless that's the end of the input. Working around this essentially requires carefully planning your grammar such that any use of (<|>) is safe to commit locally. That is, the contents of each branch must not overlap in a way that is disambiguated only by later portions of the grammar.

    Note that this unpleasant behavior with (<|>) is how the parsec family of libraries work, but not how all parser libraries in Haskell work. Other libraries work without the left bias or commit behavior the parsec family have chosen.