Using fslex I would like to return multiple tokens for one pattern but I don't see a way how to accomplish that. Even to use another rule function that returns multiple tokens would work for me.
I am trying to use something like this:
let identifier = [ 'a'-'z' 'A'-'Z' ]+
// ...
rule tokenize = parse
// ...
| '.' identifier '(' { let value = lexeme lexbuf
match operations.TryFind(value) with
// TODO: here is the problem:
// I would like to return like [DOT; op; LPAREN]
| Some op -> op
| None -> ID(value) }
| identifier { ID (lexeme lexbuf) }
// ...
The problem I am trying to solve here is to match for predefined tokens (see: operations
map) only if the identifier
is between .
and (
. Otherwise the match should be returned as an ID
.
I am fairly new to fslex so I am happy for any pointers in the right direction.
(This is a separate answer)
For this specific case, this might solve your issue better:
...
rule tokenize = parse
...
| '.' { DOT }
| '(' { LPAREN }
| identifier { ID (lexeme lexbuf) }
...
And the usage:
let parse'' text =
let lexbuf = LexBuffer<char>.FromString text
let rec tokenize =
let stack = ref []
fun lexbuf ->
if List.isEmpty !stack then
stack := [Lexer.tokenize lexbuf]
let (token :: stack') = !stack // can never get match failure,
// else the while wouldn't have exited
stack := stack'
// this match fixes the ID to an OP, if necessary
// multiple matches (and not a unified large one),
// else EOF may cause issues - this is quite important
match token with
| DOT ->
match tokenize lexbuf with
| ID id ->
match tokenize lexbuf with
| LPAREN ->
let op = findOp id
stack := op :: LPAREN :: !stack
| t -> stack := ID id :: t :: !stack
| t -> stack := t :: !stack
| _ -> ()
token
Parser.start tokenize lexbuf
This will fix the ID's to be operations, if they are surrounded by DOT and LPAREN, and only then.
P.S.: I have 3 separate matches, because a unified match would require either using Lazy<_>
values (which will make it even less readable), or will fail on a sequence of [DOT; EOF]
, because it'd expect an additional third token.