I am implementing a parser for identifier names that would consume unicode symbols. The problem I am facing is what I have some operators that are also written with unicode symbols and these might be placed directly after the identifier, for example:
time→sleep(7);
Here the arrow sign is an infix operator, which I add to my operator precedence parser:
opp.AddOperator(InfixOperator("→", ws, 10, Associativity.Right,
fun left right -> BinaryOperation(Arrow, left, right)))
It would be nice if I could just exclude all sign combinations added as operators to the OPP automatically. At the moment I do it manually using the following implementation for my identifier:
let variable =
let isAsciiIdContinue = isNoneOf "→*/+-<>=≠≤≥' ,();"
identifier (IdentifierOptions(
isAsciiIdContinue = isAsciiIdContinue,
normalization = System.Text.NormalizationForm.FormKC,
allowAllNonAsciiCharsInPreCheck = true))
However, this doesn't seem to work. I get the following error message trying to parse my code:
time→sleep(7);
^
The identifier contains an invalid character at the indicated position.
How can I make my variable
parser stop on infix operators?
isAsciiIdStart
and isAsciiIdContinue
are only meant to specify the ASCII chars valid in an identifier. The non-ASCII chars accepted by the identifier
parser are those that pass the pre-check and are valid Unicode XID chars.
Since the symbolic operators aren't valid Unicode XID identifier chars, you could simply use IdentifierOptions(normalization = System.Text.NormalizationForm.FormKC)
.