Search code examples
grammarbisonyacchappy

How to remove commas and parenthesis from function calls


In a functional language compiler written using the happy parser, which is a quite similar with yacc/bison, I implemented lists and with lists some core functions map, concat and filter, using the following rules:

Exp:
...
| concat '(' Exp ',' Exp ')'         { Concat $3 $5 }
| map '(' Exp ',' Exp ')'            { Map $3 $5 }
| filter '(' Exp ',' Exp ')'         { Filter $3 $5 }

This works just fine, but in most functional languages there is no paranthesis or commas, so instead of map(myfun, [1,2,3]) I would rather write map myfun [1,2,3]. The obvious modification in the grammar is the following:

Exp:
...
| concat Exp Exp         { Concat $2 $3 }
| map Exp Exp            { Map $2 $3 }
| filter Exp Exp         { Filter $2 $3 }

But this modification includes lots of reduce-reduce conflicts. How can I achieve the parsing of function calls without commas and paranthesis?

The smallest conflicting grammar I could extract was this:

Exp :
    -- Math
     Exp '+' Exp                         { Op $1 Add $3 }
    | Exp '-' Exp                        { Op $1 Sub $3 }

    -- Literals
    | num                                { Num $1 }
    | '-' num %prec NEGATIVE             { Num (-$2) }

    -- Lists
    | map Exp Exp                        { Map $2 $3 }

It generates 4 reduce/reduce conflicts. Removing any of the rules also ends up with the conflicts. Here is the full grammar if you are interested.


Solution

  • The problem is that since there's no token in a function application, token-based precedence conflict resolution doesn't work very well -- when its trying to decide on a shift that might be a function application and a reduce of some other expression, the lookahead token is whatever the argument expression begins with; there no 'blank space' token that can be used.

    To hack around that problem and make it work, you need to set the precedence of EVERY token that might being an expression (every token in FIRST(Exp)) to that of function application. If any of those tokens need some other precedence (eg, any token that might be either infix or prefix), this gets much trickier and might not work.

    An alternative that might work better is to not use precedence rules at all -- instead, disambiguate the grammar with different rules for each level of precedence:

    Exp: Term | Exp '+' Term
    Term: Factor | Term '*' Factor
    Factor: Primary | Factor Primary
    Primary: num | id | '(' Exp ')'