Search code examples
f#fparsecleft-recursion

How to parse a recursive left syntax rule with FParsec?


I usually use FParsec for LL grammars, but sometimes it happens that in a whole grammar only one element requires left recursive parsing (so the grammar is no longer LL). Currently I have such a situation, I have a large LL grammar implemented with FParsec, but a small grammar element is bothering me because it obviously cannot be parsed correctly.

The syntax element in question is an access to an array index à la F#, e.g. myArray.[index] where myArray can be any expression and index can be any expression too. It turns out that my function calls use square brackets, not parentheses, and my identifiers can be qualified with dots.

An example of correct syntax for an expression is: std.fold[fn, f[myArray.[0]], std.tail[myArray]].

The .[] syntax element is obviously left recursive, but perhaps there is a trick that allows me to parse it anyway? My minimal code is as follows:

open FParsec

type Name = string list

type Expr =
    (* foo, Example.Bar.fizz *)
    | Variable of Name

    (* 9, 17, -1 *)
    | Integer of int

    (* foo[3, 2], Std.sqrt[2] *)
    | FunCall of Name * Expr list

    (* (a + b), (a + (1 - c)) *)
    | Parens of Expr

    (* myArray.[0], table.[index - 1] *)
    | ArrayAccess of Expr * Expr

    (* a + b *)
    | Addition of Expr * Expr

let opp =
    new OperatorPrecedenceParser<Expr, _, _>()

let pExpr = opp.ExpressionParser

let pName =
    let id =
        identifier (IdentifierOptions(isAsciiIdStart = isAsciiLetter, isAsciiIdContinue = isAsciiLetter))

    sepBy1 id (skipChar '.')

let pVariable = pName |>> Variable

let pInt = pint32 |>> Integer

let pFunCall =
    pipe4
        pName
        (spaces >>. skipChar '[')
        (sepBy (spaces >>. pExpr) (skipChar ','))
        (spaces >>. skipChar ']')
        (fun name _ args _ -> FunCall(name, args))

let pArrayAccess =
    pipe5
        pExpr
        (spaces >>. skipChar '.')
        (spaces >>. skipChar '[')
        (spaces >>. pExpr)
        (spaces >>. skipChar ']')
        (fun expr _ _ index _ -> ArrayAccess(expr, index))

let pParens =
    between (skipChar '(') (skipChar ')') (spaces >>. pExpr)

opp.TermParser <-
    choice [ attempt pFunCall
             pVariable
             pArrayAccess
             pInt
             pParens ]
    .>> spaces

let addInfixOperator str prec assoc mapping =
    opp.AddOperator
    <| InfixOperator(str, spaces, prec, assoc, (), (fun _ leftTerm rightTerm -> mapping leftTerm rightTerm))

addInfixOperator "+" 6 Associativity.Left (fun a b -> Addition(a, b))

let startParser = runParserOnString (pExpr .>> eof) () ""

printfn "%A" <| startParser "std.fold[fn, f[myArray.[0]], std.tail[myArray]]"

One way to do this is as follows: instead of making a list of parsing choices that also lists pArrayAccess like above, which will at some point cause an infinite loop, one can modify pExpr to parse the grammar element in question as an optional element following an expression:

let pExpr =
    parse {
        let! exp = opp.ExpressionParser
        let pArrayAccess =
            between (skipString ".[") (skipString "]") opp.ExpressionParser

        match! opt pArrayAccess with
        | None -> return exp
        | Some index -> return ArrayAccess(exp, index)
    }

After testing, it turns out that this works very well if the following two conditions are not met:

  1. The contents of the square brackets must not contain access to another array ;
  2. An array cannot be accessed a second time in succession (my2DArray.[x].[y]).

This restricts usage somewhat. How can I get away with this? Is there a way to do this or do I have to change the grammar?


Solution

  • Finally, a solution to this problem is quite simple: just expect a list of array access. If the list is empty, then return the initial expression, otherwise fold over all the array accesses and return the result. Here is the implementation:

    let rec pExpr =
        parse {
            let! exp = opp.ExpressionParser
    
            let pArrayAccess =
                between (skipString ".[") (skipString "]") pExpr
    
            match! many pArrayAccess with
            | [] -> return exp
            | xs -> return List.fold
                        (fun acc curr -> ArrayAccess(acc, curr)) exp xs
        }
    

    This way of doing things meets my needs, so I'd be happy with it, if anyone passes by and wants something more general and not applicable with the proposed solution, then I refer to @Martin Freedman comment, using createParserForwardedToRef().