Search code examples
parsingf#bnffparsec

error in BNF fparsec parser


I have made the following parser to try to parse BNF:

type Literal = Literal of string
type RuleName = RuleName of string
type Term = Literal of Literal
          | RuleName of RuleName
type List = List of Term list
type Expression = Expression of List list
type Rule = Rule of RuleName * Expression
type BNF = Syntax of Rule list

let pBFN : Parser<BNF, unit> = 
   let pWS = skipMany (pchar ' ')
   let pLineEnd = skipMany1 (pchar ' ' >>. newline)

   let pLiteral = 
       let pL c = between (pchar c) (pchar c) (manySatisfy (isNoneOf ("\n" + string c)))
       (pL '"') <|> (pL '\'') |>> Literal.Literal

   let pRuleName = between (pchar '<') (pchar '>') (manySatisfy (isNoneOf "\n<>")) |>> RuleName.RuleName
   let pTerm = (pLiteral |>> Term.Literal) <|> (pRuleName |>> Term.RuleName)
   let pList = sepBy1 pTerm pWS |>> List.List
   let pExpression = sepBy1 pList (pWS >>. (pchar '|') .>> pWS) |>> Expression.Expression
   let pRule = pWS >>. pRuleName .>> pWS .>> pstring "::=" .>> pWS .>>. pExpression .>> pLineEnd |>> Rule.Rule
   many1 pRule |>> BNF.Syntax

For testing, I'm running it on BNF's BNF as per Wikipedia:

<syntax> ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression> ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression>
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | "<" <rule-name> ">"
<literal> ::= '"' <text> '"' | "'" <text> "'"

But it always fails with this error:

Error in Ln: 1 Col: 21
<syntax> ::= <rule> | <rule> <syntax>
                    ^
Expecting: ' ', '"', '\'' or '<'

What am I doing wrong?


Edit

The function I'm using to test:

let test =
   let text = "<syntax> ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> \"<\" <rule-name> \">\" <opt-whitespace> \"::=\" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= \" \" <opt-whitespace> | \"\"
<expression> ::= <list> | <list> <opt-whitespace> \"|\" <opt-whitespace> <expression>
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | \"<\" <rule-name> \">\"
<literal> ::= '\"' <text> '\"' | \"'\" <text> \"'\""
   run pBNF text

Solution

  • Your first problem is with pList: sepBy1 is greedily grabbing trailing spaces, but once it does that it then expects an additional term to follow rather than the end of the list. The simplest way to fix this is to use sepEndBy1 instead.

    This will expose your next problem: pEndLine isn't faithfully implemented because you're always looking for exactly one space followed by a newline, when you should be looking for any number of spaces instead (that is, you want pWS >>. newline in the interior, rather than pchar ' ' >>. newline).

    Finally, note that your definition requires each rule to end with a newline, so you won't be able to parse your string as given (you'll need to append an empty line to the end). Instead you might want to pull newline out of your definition of pRule and define the main parser as sepBy1 pRule pLineEnd |>> BNF.Syntax.