Search code examples
f#fslex

How to return multiple tokens for one fslex rule pattern?


Using fslex I would like to return multiple tokens for one pattern but I don't see a way how to accomplish that. Even to use another rule function that returns multiple tokens would work for me.

I am trying to use something like this:

let identifier = [ 'a'-'z' 'A'-'Z' ]+

// ...

rule tokenize = parse
// ...
| '.' identifier '(' { let value = lexeme lexbuf
                       match operations.TryFind(value) with
                      // TODO: here is the problem:
                      // I would like to return like [DOT; op; LPAREN]
                      | Some op -> op
                      | None    -> ID(value) }

| identifier         { ID (lexeme lexbuf) }
// ...

The problem I am trying to solve here is to match for predefined tokens (see: operations map) only if the identifier is between . and (. Otherwise the match should be returned as an ID.

I am fairly new to fslex so I am happy for any pointers in the right direction.


Solution

  • (This is a separate answer)

    For this specific case, this might solve your issue better:

    ...
    
    rule tokenize = parse
    ...
    | '.' { DOT }
    | '(' { LPAREN }
    | identifier { ID (lexeme lexbuf) }
    
    ...
    

    And the usage:

    let parse'' text =
        let lexbuf = LexBuffer<char>.FromString text
        let rec tokenize =
            let stack = ref []
            fun lexbuf ->
            if List.isEmpty !stack then
                stack := [Lexer.tokenize lexbuf]
            let (token :: stack') = !stack // can never get match failure,
                                            // else the while wouldn't have exited
            stack := stack'
            // this match fixes the ID to an OP, if necessary
            // multiple matches (and not a unified large one),
                  // else EOF may cause issues - this is quite important
            match token with
            | DOT ->
              match tokenize lexbuf with
              | ID id ->
                match tokenize lexbuf with
                | LPAREN ->
                  let op = findOp id
                  stack := op :: LPAREN :: !stack
                | t -> stack := ID id :: t :: !stack
              | t -> stack := t :: !stack
            | _ -> ()
            token
        Parser.start tokenize lexbuf
    

    This will fix the ID's to be operations, if they are surrounded by DOT and LPAREN, and only then.

    P.S.: I have 3 separate matches, because a unified match would require either using Lazy<_> values (which will make it even less readable), or will fail on a sequence of [DOT; EOF], because it'd expect an additional third token.