Search code examples
haskellmegaparsec

Operator precedence in megaparsec


I'm having trouble using Megaparsec 6's makeExprParser helper. I can't seem to figure out how to bind both binary ^ and unary - at the precedence levels I'd expect.

Using this makeExprParser expression parser:

expressionParser :: Parser Expression
expressionParser =
    makeExprParser termParser
      [
        [InfixR $ BinOp BinaryExp <$ symbol "^"],
        [
          Prefix $ MonOp MonoMinus <$ symbol "-",
          Prefix $ MonOp MonoPlus <$ symbol "+"
        ],
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

I would expect these tests to pass:

testEqual expressionParser "1^2" "(1)^(2)"
testEqual expressionParser "-1^2" "-(1^2)"
testEqual expressionParser "1^-2" "1^(-2)"
testEqual expressionParser "-1^-2" "-(1^(-2))"

That is, -1^-2 should parse as the same thing as -(1^(-2)). This is how e.g. Python parses it:

>>> 2**-2
0.25
>>> -2**-2
-0.25
>>> -2**2
-4

and Ruby:

irb(main):004:0> 2**-2
=> (1/4)
irb(main):005:0> -2**-2
=> (-1/4)
irb(main):006:0> -2**2
=> -4

But this Megaparsec parser instead fails to parse 1^-2 at all, instead giving me the helpful error:

(TrivialError (SourcePos {sourceName = \"test.txt\", sourceLine = Pos 1, sourceColumn = Pos 3} :| []) (Just (Tokens ('-' :| \"\"))) (fromList [Tokens ('(' :| \"\"),Label ('i' :| \"nteger\")]))")

which I read to say "I could have taken any of these characters here, but that - has me flummoxed".

If I adjust some of the precedence of the operator table like this (moving the exponent after the unary -):

expressionParser =
    makeExprParser termParser
      [
        [
          Prefix $ MonOp MonoMinus <$ symbol "-",
          Prefix $ MonOp MonoPlus <$ symbol "+"
        ],
        [InfixR $ BinOp BinaryExp <$ symbol "^"],
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

then I no longer get a parse failure, but -1^2 incorrectly parses as (-1)^2 (instead of the correct -(1^2)).

Here is a complete self-contained parser to show the problem (it requires HUnit and of course megaparsec):

module Hascas.Minimal where

import Data.Void (Void)
import Test.HUnit hiding (test)
import Text.Megaparsec hiding (ParseError)
import Text.Megaparsec.Char
import Text.Megaparsec.Expr
import qualified Text.Megaparsec as MP
import qualified Text.Megaparsec.Char.Lexer as L

data Expression
    = Literal Integer
    | MonOp MonoOperator Expression
    | BinOp BinaryOperator Expression Expression
  deriving (Read, Show, Eq, Ord)

data BinaryOperator
    = BinaryPlus
    | BinaryMinus
    | BinaryDiv
    | BinaryMult
    | BinaryExp
  deriving (Read, Show, Eq, Ord)

data MonoOperator
    = MonoPlus
    | MonoMinus
  deriving (Read, Show, Eq, Ord)

type Parser a = Parsec Void String a
type ParseError = MP.ParseError (Token String) Void

spaceConsumer :: Parser ()
spaceConsumer = L.space space1 lineComment blockComment
  where
    lineComment  = L.skipLineComment "//"
    blockComment = L.skipBlockComment "/*" "*/"

lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer

symbol :: String -> Parser String
symbol = L.symbol spaceConsumer

expressionParser :: Parser Expression
expressionParser =
    makeExprParser termParser
      [
        [InfixR $ BinOp BinaryExp <$ symbol "^"],
        [
          Prefix $ MonOp MonoMinus <$ symbol "-",
          Prefix $ MonOp MonoPlus <$ symbol "+"
        ],
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

termParser :: Parser Expression
termParser = (
        (try $ Literal <$> L.decimal)
    <|> (try $ parens expressionParser))

parens :: Parser a -> Parser a
parens x = between (symbol "(") (symbol ")") x

main :: IO ()
main = do
    -- just to show that it does work in the + case:
    test expressionParser "1+(-2)" $
      BinOp BinaryPlus (Literal 1) (MonOp MonoMinus $ Literal 2)
    test expressionParser "1+-2" $
      BinOp BinaryPlus (Literal 1 ) (MonOp MonoMinus $ Literal 2)

    -- but not in the ^ case
    test expressionParser "1^-2" $
      BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)
    test expressionParser "-1^2" $
      MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (Literal 2)
    test expressionParser "-1^-2" $
      MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)

    -- exponent precedence is weird
    testEqual expressionParser "1^2" "(1)^(2)"
    testEqual expressionParser "-1^2" "-(1^2)"
    testEqual expressionParser "1^-2" "1^(-2)"
    testEqual expressionParser "-1^-2" "-(1^(-2))"
    testEqual expressionParser "1^2^3^4" "1^(2^(3^(4))))"
  where
    test :: (Eq a, Show a) => Parser a -> String -> a -> IO ()
    test parser input expected = do
      assertEqual input (Right expected) $ parse (spaceConsumer >> parser <* eof) "test.txt" input

    testEqual :: (Eq a, Show a) => Parser a -> String -> String -> IO ()
    testEqual parser input expected = do
        assertEqual input (p expected) (p input)
      where
        p i = parse (spaceConsumer >> parser <* eof) "test.txt" i

Is it possible to get Megaparsec to parse these operators at the precedence levels that other languages do?


Solution

  • makeExprParser termParser [precN, ..., prec1] will produce an expression parser that works in such a way that each level of precedence invokes the next-higher level of precedence. So if you'd manually define it, you'd have a rule for infix + and -, which uses the mult-and-div rule as the operands. That in turn would use the prefix rule as the operands and that would use the ^ rule as the operand. Finally the ^ rule uses termParser for the operands.

    The important thing to note here is that the ^ rule (or more generally: any rule with a higher precedence than the prefix operators) invokes a parser that won't accept prefix operators at the beginning. So prefix operators can't appear on the right of such operators (except inside parentheses).

    What this basically means is that your use case is not supported by makeExprParser.

    To work around this, you can use makeExprParser to only handle the infix operators with a lower precedence than the prefix operators and then handle the prefix operators and ^ manually, so that the right operand of ^ would "loop back" to the prefix operators. Something like this:

    expressionParser =
        makeExprParser prefixParser
          [
            [
              InfixL $ BinOp BinaryMult <$ symbol "*",
              InfixL $ BinOp BinaryDiv <$ symbol "/"
            ],
            [
              InfixL $ BinOp BinaryPlus <$ symbol "+",
              InfixL $ BinOp BinaryMinus <$ symbol "-"
            ]
          ]
    
    prefixParser =
      do
        prefixOps <- many prefixOp
        exp <- exponentiationParser
        return $ foldr ($) exp prefixOps
      where
        prefixOp = MonOp MonoMinus <$ symbol "-" <|> MonOp MonoPlus <$ symbol "+"
    
    exponentiationParser =
      do
        lhs <- termParser
        -- Loop back up to prefix instead of going down to term
        rhs <- optional (symbol "^" >> prefixParser)
        return $ maybe lhs (BinOp BinaryExp lhs) rhs
    

    Note that, unlike makeExprParser, this also allows multiple consecutive prefix operators (like --x for double negation). If you don't want that, replace many with optional in the definition of prefixParser.