Search code examples
parsinghaskellparsec

Why doesn't Parsec allow white spaces before operators in the table for buildExpressionParser


In the code below I can correctly parse white spaces after each of the tokens using Parsec:

whitespace = skipMany (space <?> "")

number :: Parser Integer
number = result <?> "number"
  where
  result = do {
    ds <- many1 digit;
    whitespace;
    return (read ds)
  }

table = result
  where
  result = [
    [Infix (genParser '*' (*)) AssocLeft, 
     Infix (genParser '/' div) AssocLeft],
    [Infix (genParser '+' (+)) AssocLeft, 
     Infix (genParser '-' (-)) AssocLeft]]
  genParser s f = char s >> whitespace >> return f

factor = parenExpr <|> number <?> "parens or number"
  where
  parenExpr = do {
    char '(';
    x <- expr;
    char ')';
    whitespace;
    return x
  }

expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"

However I get a parse error when trying to only parse white spaces before, and after the operators:

whitespace = skipMany (space <?> "")

number :: Parser Integer
number = result <?> "number"
  where
  result = do {
    ds <- many1 digit;
    return (read ds)
  }

table = result
  where
  result = [
    [Infix (genParser '*' (*)) AssocLeft, 
     Infix (genParser '/' div) AssocLeft],
    [Infix (genParser '+' (+)) AssocLeft, 
     Infix (genParser '-' (-)) AssocLeft]]
  genParser s f = whitespace >> char s >> whitespace >> return f

factor = parenExpr <|> number <?> "parens or number"
  where
  parenExpr = do {
    char '(';
    x <- expr;
    char ')';
    return x
  }

expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"

The parse error is:

$ ./parsec_example < <(echo "2 * 2 * 3")
"(stdin)" (line 2, column 1):
unexpected end of input
expecting "*"

Why does this happen? Is there some other way to parse white space around just the operators?


Solution

  • When I test your code, 2 * 2 * 3 parses correctly, but 2 + 2 does not. Parsing fails because the parser for * consumes some input and backtracking isn't enabled at that position, so other parsers cannot be tried.

    An expression parser created by buildExpressionParser tries to parse each operator in turn until one succeeds. When parsing 2 + 2, the following occurs:

    • The first 2 is matched by number. The rest of the input is  + 2 (note the space at the beginning).
    • The parser genParser '*' (*) is applied to the input. It consumes the space, but does not match the + character.
    • The other infix operator parsers automatically fail because some input was consumed by genParser '*' (*).

    You can fix this by wrapping the critical part of the parser in try. This saves the input until after char s succeeds. If char s fails, then buildExpressionParser can backtrack and try another infix operator.

    genParser s f = try (whitespace >> char s) >> whitespace >> return f
    

    The drawback of this parser is that, because it backtracks to before the leading whitespace before an infix operator, it repeatedly scans whitespace. It is usually better to parse whitespace after a successful match, like the OP's first parser example.