Search code examples
haskellparser-combinatorsmegaparsec

How to correctly parse field access after function call via parser-combinators (makeExprParser) library?


I want to parse expressions like this: a().x. It should look like EAttrRef (EFuncCall (EVarRef "a") []) "x". Unfortunately my expression parser is stopping too soon, it only parses a() and then stops.

1:4:
  |
1 | a().x
  |    ^
unexpected '.'
expecting end of input

Code:

pExpr :: Parser Expr
pExpr = lexeme p & dbg "pExpr" <?> "expression"
  where
    pTerm = try pVarRef <|> pELit
    p = makeExprParser pTerm exprTable
    exprTable = [[Postfix opIndexRef], [InfixL opAttrRef], [Postfix opFuncCall]]
    opAttrRef :: Parser (Expr -> Expr -> Expr)
    opAttrRef = do
      symbol "." & dbg "opAttrRef symbol \".\""
      return r
      where
        r x (EVarRef y) = EAttrRef x y
        r x y = error [qq|opAttrRef got unexpected right operand $y (left operand was $x)|]
    opFuncCall :: Parser (Expr -> Expr)
    opFuncCall = do
      symbol "("
      args <- sepBy pExpr (symbol ",")
      symbol ")" & dbg "opFuncCall symbol \")\""
      return $ \funcExpr -> EFuncCall funcExpr args
    opIndexRef = do
      symbol "["
      e <- pExpr
      symbol "]" & dbg "opIndexRef symbol \"]\""
      return $ \obj -> EIndexRef obj e

Debug output:

opAttrRef symbol "."> IN: "().x"
opAttrRef symbol "."> MATCH (EERR): <EMPTY>
opAttrRef symbol "."> ERROR:
opAttrRef symbol "."> offset=1:
opAttrRef symbol "."> unexpected '('
opAttrRef symbol "."> expecting '.'

pExpr> IN: ").x"
pExpr> MATCH (EERR): <EMPTY>
pExpr> ERROR:
pExpr> offset=2:
pExpr> unexpected ").x"
pExpr> expecting "false", "null", "true", '"', '+', '-', '[', digit, identifier, or integer

opFuncCall symbol ")"> IN: ").x"
opFuncCall symbol ")"> MATCH (COK): ')'
opFuncCall symbol ")"> VALUE: ")"

pExpr> IN: "a().x"
pExpr> MATCH (COK): "a()"
pExpr> VALUE: EFuncCall (EVarRef "a") []

It seems to me that makeExprParser is not calling opFuncCall second time (compared to how index access debug output looks), but I have no idea why not.

It parses when I decrease opAttrRef priority, but then it produces wrong trees (e.g. right operand of x.a() would be a() which is incorrect, it should be a and then the whole think should be in function call), so I can't use that (I am quite sure current priority is correct, since it's based on the reference of that language).


Solution

  • Your current expression parser looks like the following BNF:

    expr = funcOp ;
    funcOp = attrOp , { "(" , expr, ")" } ;
    attrOp = attrOp , "." , indexOp | indexOp ;
    indexOp = term , { "[", expr, "]" } ;
    

    Once it finishes parsing funcCall, it will not go back up in the operator table and parse any attrRef or indexRef.

    The problem with decreasing the priority of opAttrRef is that parses the left and right hand side of the dot separately, when it seems like you want the parser to read from left to right, and be able to mix any of funcCall, attrRef, or indexRef. So if you want to be able to parse something like a[b](c).d(e)[f], I'd suggest changing opAttrRef from infix to postfix, and flatten the operator table into:

    exprTable = [[Postfix opIndexRef, PostFix opAttrRef, Postfix opFuncCall]]
    

    At this point, the parser becomes:

    expr = term , { indexRef | attrRef | funcCall } ;
    

    If you need multiple postfix operators being allowed, you can rewrite your expression parser like this:

    p = liftM2 (foldl (flip ($))) pTerm (many (opAttrRef <|> opIndexRef <|> opFuncCall))
    

    The p parser can be used as a term parser for makeExprParser, if you want to add arithmetic, logic and other common operators.