I would like to parse arithmetic expressions.
Here is my current parser:
data AExpr
= ExprAsAExpr Expr
| IntConst Integer
| Neg AExpr
| ABinary ABinOp AExpr AExpr
deriving (Show, Eq)
aExpr :: Parser AExpr
aExpr = makeExprParser aTerm aOperators
aTerm :: Parser AExpr
aTerm
= parens aExpr
<|> IntConst <$> integerParser
aOperators :: [[Operator Parser AExpr]]
aOperators =
[ [Prefix (Neg <$ symbol "-") ]
, [ InfixL (ABinary Multiply <$ symbol "*")
, InfixL (ABinary Divide <$ symbol "/") ]
, [ InfixL (ABinary Add <$ symbol "+")
, InfixL (ABinary Subtract <$ symbol "-") ]
]
Using this I can correctly parse this:
1 + 2
Generating an AST like this.
(ABinary Add (IntConst 1) (IntConst 2))
Another thing I can parse are general expressions. These can be things such as variables, method calls, ternaries etc.
E.g.
Identifiers:
varName
This generates:
(Identifier (Name "varName"))
Method Calls:
methodCall()
This generates:
(MethodCall (Name "methodCall") (BlockExpr []))
Here's an example for parsing general expressions.
expressionParser :: Parser Expr
expressionParser
= methodCallParser
<|> identifierParser
This works fine but I would also like to parse arithmetic expressions in this.
expressionParser :: Parser Expr
expressionParser
= newClassInstanceParser
<|> methodCallParser
<|> AExprAsExpr <$> aExpr
<|> identifierParser
This means using the expressionParser
I can now parse all the different expressions including arithmetic expressions. If it happens to be an arithmetic expression it gets wrapped in AExprAsExpr
.
Problem
I would like to parse arithmetic expressions containing other expressions.
E.g.
x + y
To do this my original thought was to change the arithmetic parser so it also parses expressions.
data AExpr
= ExprAsAExpr Expr
| IntConst Integer
| Neg AExpr
| ABinary ABinOp AExpr AExpr
deriving (Show, Eq)
aExpr :: Parser AExpr
aExpr = makeExprParser aTerm aOperators
aTerm :: Parser AExpr
aTerm
= parens aExpr
<|> IntConst <$> integerParser
<|> ExprAsAExpr <$> expressionParser
aOperators :: [[Operator Parser AExpr]]
aOperators =
[ [Prefix (Neg <$ symbol "-") ]
, [ InfixL (ABinary Multiply <$ symbol "*")
, InfixL (ABinary Divide <$ symbol "/") ]
, [ InfixL (ABinary Add <$ symbol "+")
, InfixL (ABinary Subtract <$ symbol "-") ]
]
The problem with this is there is a recursive loop as aTerm
calls the expression parser, the expression parser calls aExpr
. This leads to an infinite loop. There is also the issue that all identifiers
will now be wrapped inside an AExprAsExpr
.
What is the correct method of parsing expressions inside arithmetic expressions?
EDIT I just now realized that you are using makeExpressionParser
and my answer doesn't really apply to that. Anyway maybe this answer is still helpful?
Parsec is a type of recursive-descent parser, which means it cannot handle left recursion, as you are seeing. You need to factor it out, which can always be done if the grammar is context-free. One way you see this factorization done is by having a production for each precedence level. Here is an example grammar for simple arithmetic:
expr ::= addExpr
addExpr ::= mulExpr '+' addExpr
| mulExpr '-' addExpr
| mulExpr
mulExpr ::= term '*' mulExpr
| term '/' mulExpr
| term
term ::= '(' expr ')'
| number
Notice the pattern: the first symbol in each production calls down to the next more specific one. Then explicit parentheses allow us to re-enter the top-level production. This is generally how operator precedence is expressed in recursive descent.
The above grammar can only produce right-nested expressions. While it will accept exactly the right strings, it does not correctly parse when the interpretation is left-associative. In particular,
1 - 2 - 3 - 4
will be parsed as
1 - (2 - (3 - 4))
which is not correct according to our conventions. In a general recursive-descent parser you have to do some tricks to associate correctly here. In Parsec, however, we have the many
combinators, which we can use to our advantage. For example, to parse left-associated subtraction, we could say
subExpr = foldl1 (-) <$> many1 mulExpr
The next level here are apparently the chainl
combinators which seem to have been designed for just this purpose (though I just learned about it now -- guess I should have perused the docs more). An example of using this would be
addExpr = chainl1 mulExpr oper
where
oper = choice [ (+) <$ symbol '+'
, (-) <$ symbol '-'
]
I love writing parsers in Haskell. Good luck!