Search code examples
f#fsyaccfslex

Given a lexer implemented in FsLexYacc, how do I get all of the tokens?


I have a lexer and parser implemented in FsLexYacc. To debug the lexer, I would like to print all of the tokens for a given string.

Here is what I have so far:

#load "../.paket/load/net5.0/FsLexYacc.Runtime.fsx"

#load "./Domain.fs"
#load "./Parser.fs"
#load "./Lexer.fs"

open System
open System.IO
open FSharp.Text
open FSharp.Text.Lexing
open Scripting

let allTokens (input : string) =
  let lexBuffer = LexBuffer<char>.FromString input
  Lexer.tokenize lexBuffer // Only gets first token!

printfn "%A" <| allTokens "1 + 1"

NUMBER 1

But this is only the first token!

How do I get all of the tokens as a list or sequence?


Solution

  • Lexer.tokenize can be called repeatedly to get more tokens.

    Usually Your lexer definition can match on eof when it reaches the end of the file, and may return a specific token to indicate "end of file".

    let tokenize = parse
        ... 
       | eof -> { Token.EOF }
    

    In that case, you may just call Lexer.tokenize until you receive an EOF token. You can of course do this iteratively, recursively, or by composing builtins.

    let allTokens = 
        Seq.initInfinite (fun _ -> Lexer.tokenize lexBuffer)
        |> Seq.takeWhile ( (<>) Token.EOF )
    
    let rec allTokens = 
        match Lexer.tokenize lexBuffer with
        | Token.EOF -> []
        | t -> t :: allTokens