I try to parse the call of a function, here are the variants:
add 8 2
add x y
add (inc x) (dec y)
funcWithoutArgs
Depending on how I distribute my analyzers in the code, and perhaps also how they are coded, I get errors, as well as successful but unwanted analyses. For example, this:
add 4 7
returns the following AST:
[Call ("foo",[Number 4]);
Number 7]
He therefore only takes the first parameter.
When I do that:
foo x y
He sends me back this AST:
[Call ("foo",[Call ("x",[Call ("y",[])])])]
And that's not what I want, since here, each parameter calls the next one as a parameter.
Another example, when I do this:
foo x y
inc x
I get:
[Call ("foo",[Call ("x",[Call ("y",[Call ("inc",[Call ("x",[])])])])])]
It does the same as above, but also calls the code that follows the line. When I ask my analyzer for a new line (see code), it sends me this:
[Call ("foo",[]); Call ("x",[]); Call ("y",[]); Call ("inc",[]); Call ("x",[])]
Even in brackets it doesn't work:
foo (x) (y)
Give:
[Call ("foo",[]); Call ("x",[]); Call ("y",[])]
And:
add (inc x) (dec y)
Give:
Error in Ln: 1 Col: 1
Note: The error occurred on an empty line.
The parser backtracked after:
Error in Ln: 2 Col: 5
add (inc x) (dec y)
^
Expecting: end of input or integer number (32-bit, signed)
The parser backtracked after:
Error in Ln: 2 Col: 10
add (inc x) (dec y)
^
Expecting: ')'
[]
In short, my function call analyzer does not work properly. Every time I change something, like a new line, an attempt, or a different hierarchy, something doesn't work... Do you have any idea how to solve this very annoying problem?
Here is the minimum functional code that was used:
open FParsec
// Ast
type Expression =
| Number of int
| Call of string * Expression list
type Program = Expression list
// Tools
let private bws p =
spaces >>? p .>>? spaces
let private suiteOf p =
sepEndBy p spaces1
let inline private betweenParentheses p label =
between (pstring "(") (pstring ")") p
<?> (label + " between parentheses")
let private identifier =
many1Satisfy2 isLetter (fun c -> isLetter c)
// Expressions
let rec private call = parse {
let! call = pipe2 (spaces >>? identifier) (spaces >>? parameters)
(fun id parameters -> Call(id, parameters)) // .>>? newline
return call
}
and private parameters = suiteOf expression
and private callFuncWithoutArgs =
identifier |>> fun id -> Call(id, [])
and private number = pint32 |>> Number
and private betweenParenthesesExpression =
parse { let! ex = betweenParentheses expression "expression"
return ex }
and private expression =
bws (attempt betweenParenthesesExpression <|>
attempt number <|>
attempt call <|>
callFuncWithoutArgs)
// -------------------------------
let parse code =
let parser = many expression .>>? eof
match run parser code with
| Success(result, _, _) -> result
| Failure(msg, _, _) ->
printfn "%s" msg
[]
System.Console.Clear()
parse @"
add 4 7
foo x y
inc x
foo (x) (y)
add (inc x) (dec y)
" |> printfn "%A"
Your main problem is that you have the wrong high-level design for your parser.
Your current design is that an expression can be:
Looking at the expression foo x y
, let's apply those rules in order as a parser would. There are no parentheses and foo
isn't a number, so it's either 3 or 4. First we try 3. foo
is followed by x y
: does x y
parse as an expression? Why, yes, it does: it parses as a call with parameters, where x
is the function and y
is the parameter. Since x y
matches 3, it parses according to rule 3 without checking rule 4, and so foo x y
matches like foo (x y)
would: a call to foo
with a single parameter, which is a call to x
with parameter y
.
How to fix this? Well, you could try swapping the order of 3 and 4, so that a function call without parameters is checked before a call with parameters (which would make x y
parse as just x
. But that would fail, because foo x y
would match as just foo
. So putting rule 4 before rule 3 doesn't work here.
The real solution is to split the rules for an expression apart into two levels. The "inner" level, which I'll call a "value", could be:
And the "outer" level, the parse rules for expressions, would be:
Note that these parsing levels are mutually recursive, so you'll need to use createParserForwardedToRef
in your implementation. Let's look at how foo x y
would be parsed with this design:
First, foo
parses as an identifier, so check if it could be a function call with parameters. Does x
parse as a value? Yes, under rule 3 of values. And does y
parse as a value? Yes, under rule 3 of values. So foo x y
parses as a function call.
Now what about funcWithoutParameters
? It would fail rule 1 of expressions because it's not followed by a list of parameters. So it would be checked for rule 2 of expressions, and then it would match under rule 3 of values.
Okay, a basic sanity check of the pseudocode works, so let's turn this into code. But first, I'll mention the other problem in your parser which I haven't mentioned yet, which is that you don't realize that the FParsec spaces
parser also matches newlines. So when you wrap your expression
parser in bws
("between whitespace"), it will also consume newlines after the text it parses. So when you're parsing something like:
foo a b
inc c
The suiteOf expression
sees the list a b inc c
and turns all of those into parameters for foo
. In my code below I've distinguished between FParsec's spaces
parser (which includes newlines) and a parser that parses only horizontal whitespace (space and tab but not newline), using each in the appropriate place. The following code implements the design I mentioned in this answer and its output looks right to me for all the test expressions you wrote:
open FParsec
// Ast
type Expression =
| Number of int
| Call of string * Expression list
type Program = Expression list
// Tools
let private justSpaces = skipMany (pchar ' ' <|> pchar '\t')
let private justSpaces1 = skipMany1 (pchar ' ' <|> pchar '\t')
let private bws p =
spaces >>? p .>>? spaces
let private suiteOf p =
sepEndBy1 p (justSpaces1)
let inline private betweenParentheses p label =
between (pstring "(") (pstring ")") p
<?> (label + " between parentheses")
let private identifier =
many1Satisfy2 isLetter (fun c -> isLetter c)
// Expressions
let private expression, expressionImpl = createParserForwardedToRef()
let private betweenParenthesesExpression =
parse { let! ex = betweenParentheses expression "expression"
return ex }
let private callFuncWithoutArgs =
(identifier |>> fun id -> Call(id, []))
let private number = pint32 |>> Number
let private value =
justSpaces >>? (attempt betweenParenthesesExpression <|>
attempt number <|>
callFuncWithoutArgs)
let private parameters = suiteOf value
let rec private callImpl = parse {
let! call = pipe2 (justSpaces >>? identifier) (justSpaces >>? parameters)
(fun id parameters -> Call(id, parameters))
return call }
let call = callImpl
expressionImpl.Value <-
bws (attempt call <|>
value)
// -------------------------------
let parse code =
let parser = many expression .>>? (spaces >>. eof)
match run parser code with
| Success(result, _, _) -> result
| Failure(msg, _, _) ->
printfn "%s" msg
[]
System.Console.Clear()
parse @"
add 4 7
foo x y
inc x
foo (x) (y)
add (inc x) (dec y)
" |> printfn "%A"
P.S. I used the following operator suggested by http://www.quanttec.com/fparsec/users-guide/debugging-a-parser.html to greatly help me in tracing the problem:
let (<!>) (p: Parser<_,_>) label : Parser<_,_> =
fun stream ->
printfn "%A: Entering %s" stream.Position label
let reply = p stream
printfn "%A: Leaving %s (%A)" stream.Position label reply.Status
reply
Usage: turn let parseFoo = ...
into let parseFoo = ... <!> "foo"
. Then you'll get a stream of debugging output in your console that looks something like this:
(Ln: 2, Col: 20): Entering expression
(Ln: 3, Col: 1): Entering call
(Ln: 3, Col: 5): Entering parameters
(Ln: 3, Col: 5): Entering bwParens
(Ln: 3, Col: 5): Leaving bwParens (Error)
(Ln: 3, Col: 5): Entering number
(Ln: 3, Col: 6): Leaving number (Ok)
(Ln: 3, Col: 7): Entering bwParens
(Ln: 3, Col: 7): Leaving bwParens (Error)
(Ln: 3, Col: 7): Entering number
(Ln: 3, Col: 8): Leaving number (Ok)
(Ln: 3, Col: 8): Leaving parameters (Ok)
(Ln: 3, Col: 8): Leaving call (Ok)
(Ln: 3, Col: 8): Leaving expression (Ok)
That helps greatly when you're trying to figure out why your parser isn't doing what you expect.