Search code examples
ocamllexical-analysisocamllexocamlyacc

Return multiple tokens in ocamllex


Is there any way to return multiple tokens in OCamlLex?

I'm trying to write a lexer and parser for an indentation based language, and I would like my lexer to return multiple DEDENT tokens when it notices that the indentation level is less than it previously was. This will allow it to notify the parser when multiple blocks have ended.

By following this method, I would be able to use INDENT and DEDENT as drop-in replacements for BEGIN and END, as these two tokens would be implied by the INDENT and DEDENT tokens.


Solution

  • Return the list of tokens. If the parser cannot natively handle that (say ocamlyacc) - just insert a cache in between :

    let cache =
      let l = ref [] in
      fun lexbuf ->
        match !l with
        | x::xs -> l := xs; x
        | [] -> match Lexer.tokens lexbuf with
                | [] -> failwith "oops"
                | x::xs -> l := xs; x
    

    Or you can run the lexer on the full document and then run the parser on the full token stream.

    BTW did you see ocaml+twt?