Search code examples
parsingf#fparsec

How to make identifier parser stop on operators of OperationPrecedenceParser in FParsec?


I am implementing a parser for identifier names that would consume unicode symbols. The problem I am facing is what I have some operators that are also written with unicode symbols and these might be placed directly after the identifier, for example:

time→sleep(7);

Here the arrow sign is an infix operator, which I add to my operator precedence parser:

opp.AddOperator(InfixOperator("→", ws, 10, Associativity.Right, 
      fun left right -> BinaryOperation(Arrow, left, right)))

It would be nice if I could just exclude all sign combinations added as operators to the OPP automatically. At the moment I do it manually using the following implementation for my identifier:

let variable =
    let isAsciiIdContinue = isNoneOf "→*/+-<>=≠≤≥' ,();"

    identifier (IdentifierOptions(
                    isAsciiIdContinue = isAsciiIdContinue,
                    normalization = System.Text.NormalizationForm.FormKC,
                    allowAllNonAsciiCharsInPreCheck = true))

However, this doesn't seem to work. I get the following error message trying to parse my code:

  time→sleep(7);
      ^
The identifier contains an invalid character at the indicated position.

How can I make my variable parser stop on infix operators?


Solution

  • isAsciiIdStart and isAsciiIdContinue are only meant to specify the ASCII chars valid in an identifier. The non-ASCII chars accepted by the identifier parser are those that pass the pre-check and are valid Unicode XID chars.

    Since the symbolic operators aren't valid Unicode XID identifier chars, you could simply use IdentifierOptions(normalization = System.Text.NormalizationForm.FormKC).