Search code examples
f#fparsec

Parser the call of a function - FParsec


I try to parse the call of a function, here are the variants:

add 8 2
add x y
add (inc x) (dec y)
funcWithoutArgs

Depending on how I distribute my analyzers in the code, and perhaps also how they are coded, I get errors, as well as successful but unwanted analyses. For example, this:

add 4 7

returns the following AST:

[Call ("foo",[Number 4]);
 Number 7]

He therefore only takes the first parameter.

When I do that:

foo x y

He sends me back this AST:

[Call ("foo",[Call ("x",[Call ("y",[])])])]

And that's not what I want, since here, each parameter calls the next one as a parameter.

Another example, when I do this:

foo x y
inc x

I get:

[Call ("foo",[Call ("x",[Call ("y",[Call ("inc",[Call ("x",[])])])])])]

It does the same as above, but also calls the code that follows the line. When I ask my analyzer for a new line (see code), it sends me this:

[Call ("foo",[]); Call ("x",[]); Call ("y",[]); Call ("inc",[]); Call ("x",[])]

Even in brackets it doesn't work:

foo (x) (y)

Give:

[Call ("foo",[]); Call ("x",[]); Call ("y",[])]

And:

add (inc x) (dec y)

Give:

Error in Ln: 1 Col: 1
Note: The error occurred on an empty line.

The parser backtracked after:
  Error in Ln: 2 Col: 5
  add (inc x) (dec y)
      ^
  Expecting: end of input or integer number (32-bit, signed)

  The parser backtracked after:
    Error in Ln: 2 Col: 10
    add (inc x) (dec y)
             ^
    Expecting: ')'

[]

In short, my function call analyzer does not work properly. Every time I change something, like a new line, an attempt, or a different hierarchy, something doesn't work... Do you have any idea how to solve this very annoying problem?

Here is the minimum functional code that was used:

open FParsec

// Ast

type Expression =
    | Number of int
    | Call of string * Expression list

type Program = Expression list

// Tools

let private bws p =
    spaces >>? p .>>? spaces

let private suiteOf p =
    sepEndBy p spaces1

let inline private betweenParentheses p label =
    between (pstring "(") (pstring ")") p
    <?> (label + " between parentheses")

let private identifier =
    many1Satisfy2 isLetter (fun c -> isLetter c)

// Expressions

let rec private call = parse {
        let! call = pipe2 (spaces >>? identifier) (spaces >>? parameters)
                        (fun id parameters -> Call(id, parameters)) // .>>? newline
        return call
    }

and private parameters = suiteOf expression

and private callFuncWithoutArgs =
    identifier |>> fun id -> Call(id, [])

and private number = pint32 |>> Number

and private betweenParenthesesExpression =
    parse { let! ex = betweenParentheses expression "expression"
            return ex }

and private expression =
    bws (attempt betweenParenthesesExpression <|>
         attempt number <|>
         attempt call <|>
         callFuncWithoutArgs)

// -------------------------------

let parse code =
    let parser = many expression .>>? eof

    match run parser code with
        | Success(result, _, _) -> result
        | Failure(msg, _, _) ->
            printfn "%s" msg
            []

System.Console.Clear()

parse @"
add 4 7

foo x y

inc x

foo (x) (y)

add (inc x) (dec y)

" |> printfn "%A"

Solution

  • Your main problem is that you have the wrong high-level design for your parser.

    Your current design is that an expression can be:

    1. An expression (a "sub-expression", so to speak) between parentheses (no problem here)
    2. A number (no problem here)
    3. A call with parameters, which is an identifier followed by a space-separated list of expressions (this is the main part of the problem)
    4. A call without parameters, which is a single identifier (this contributes to the problem)

    Looking at the expression foo x y, let's apply those rules in order as a parser would. There are no parentheses and foo isn't a number, so it's either 3 or 4. First we try 3. foo is followed by x y: does x y parse as an expression? Why, yes, it does: it parses as a call with parameters, where x is the function and y is the parameter. Since x y matches 3, it parses according to rule 3 without checking rule 4, and so foo x y matches like foo (x y) would: a call to foo with a single parameter, which is a call to x with parameter y.

    How to fix this? Well, you could try swapping the order of 3 and 4, so that a function call without parameters is checked before a call with parameters (which would make x y parse as just x. But that would fail, because foo x y would match as just foo. So putting rule 4 before rule 3 doesn't work here.

    The real solution is to split the rules for an expression apart into two levels. The "inner" level, which I'll call a "value", could be:

    1. An expression between parentheses
    2. A number
    3. A function call without parameters

    And the "outer" level, the parse rules for expressions, would be:

    1. A function call with parameters, all of which are values, not expressions
    2. A value

    Note that these parsing levels are mutually recursive, so you'll need to use createParserForwardedToRef in your implementation. Let's look at how foo x y would be parsed with this design:

    First, foo parses as an identifier, so check if it could be a function call with parameters. Does x parse as a value? Yes, under rule 3 of values. And does y parse as a value? Yes, under rule 3 of values. So foo x y parses as a function call.

    Now what about funcWithoutParameters? It would fail rule 1 of expressions because it's not followed by a list of parameters. So it would be checked for rule 2 of expressions, and then it would match under rule 3 of values.

    Okay, a basic sanity check of the pseudocode works, so let's turn this into code. But first, I'll mention the other problem in your parser which I haven't mentioned yet, which is that you don't realize that the FParsec spaces parser also matches newlines. So when you wrap your expression parser in bws ("between whitespace"), it will also consume newlines after the text it parses. So when you're parsing something like:

    foo a b
    inc c
    

    The suiteOf expression sees the list a b inc c and turns all of those into parameters for foo. In my code below I've distinguished between FParsec's spaces parser (which includes newlines) and a parser that parses only horizontal whitespace (space and tab but not newline), using each in the appropriate place. The following code implements the design I mentioned in this answer and its output looks right to me for all the test expressions you wrote:

    open FParsec
    
    // Ast
    
    type Expression =
        | Number of int
        | Call of string * Expression list
    
    type Program = Expression list
    
    // Tools
    
    let private justSpaces  = skipMany  (pchar ' ' <|> pchar '\t')
    let private justSpaces1 = skipMany1 (pchar ' ' <|> pchar '\t')
    
    let private bws p =
        spaces >>? p .>>? spaces
    
    let private suiteOf p =
        sepEndBy1 p (justSpaces1)
    
    let inline private betweenParentheses p label =
        between (pstring "(") (pstring ")") p
        <?> (label + " between parentheses")
    
    let private identifier =
        many1Satisfy2 isLetter (fun c -> isLetter c)
    
    // Expressions
    
    let private expression, expressionImpl = createParserForwardedToRef()
    
    let private betweenParenthesesExpression =
        parse { let! ex = betweenParentheses expression "expression"
                return ex }
    
    let private callFuncWithoutArgs =
        (identifier |>> fun id -> Call(id, []))
    
    let private number = pint32 |>> Number
    
    let private value =
        justSpaces >>? (attempt betweenParenthesesExpression <|>
                        attempt number <|>
                        callFuncWithoutArgs)
    
    let private parameters = suiteOf value
    
    let rec private callImpl = parse {
            let! call = pipe2 (justSpaces >>? identifier) (justSpaces >>? parameters)
                              (fun id parameters -> Call(id, parameters))
            return call }
    
    let call = callImpl
    
    expressionImpl.Value <-
        bws (attempt call <|>
             value)
    
    // -------------------------------
    
    let parse code =
        let parser = many expression .>>? (spaces >>. eof)
    
        match run parser code with
            | Success(result, _, _) -> result
            | Failure(msg, _, _) ->
                printfn "%s" msg
                []
    
    System.Console.Clear()
    
    parse @"
    add 4 7
    
    foo x y
    
    inc x
    
    foo (x) (y)
    
    add (inc x) (dec y)
    " |> printfn "%A"
    

    P.S. I used the following operator suggested by http://www.quanttec.com/fparsec/users-guide/debugging-a-parser.html to greatly help me in tracing the problem:

    let (<!>) (p: Parser<_,_>) label : Parser<_,_> =
        fun stream ->
            printfn "%A: Entering %s" stream.Position label
            let reply = p stream
            printfn "%A: Leaving %s (%A)" stream.Position label reply.Status
            reply
    

    Usage: turn let parseFoo = ... into let parseFoo = ... <!> "foo". Then you'll get a stream of debugging output in your console that looks something like this:

    (Ln: 2, Col: 20): Entering expression
    (Ln: 3, Col: 1): Entering call
    (Ln: 3, Col: 5): Entering parameters
    (Ln: 3, Col: 5): Entering bwParens
    (Ln: 3, Col: 5): Leaving bwParens (Error)
    (Ln: 3, Col: 5): Entering number
    (Ln: 3, Col: 6): Leaving number (Ok)
    (Ln: 3, Col: 7): Entering bwParens
    (Ln: 3, Col: 7): Leaving bwParens (Error)
    (Ln: 3, Col: 7): Entering number
    (Ln: 3, Col: 8): Leaving number (Ok)
    (Ln: 3, Col: 8): Leaving parameters (Ok)
    (Ln: 3, Col: 8): Leaving call (Ok)
    (Ln: 3, Col: 8): Leaving expression (Ok)
    

    That helps greatly when you're trying to figure out why your parser isn't doing what you expect.