Search code examples
ocamlocamllex

Print tokenization of a string


I'm currently working on a programming language as a hobby. It would make lexing errors massively easier to debug if it was possible to have ocamllex print out the tokens it matches as it finds them, I occasionally just add print statements to my rules manually but there should be an easier way to do that.

So what I'm asking is, given a .mll file and some input, is there an automatic way to see the corresponding tokens?


Solution

  • I don't think there is a built-in way to ask the lexer to print its tokens.

    If you use ocamlyacc, you can set the p option in OCAMLRUNPARAM to see a trace of the parser's actions. This is described in Section 12.5 of the OCaml manual. See Section 10.2 for a description of OCAMLRUNPARAM.

    If you don't mind a crude hack, I just wrote a small script lext that adds tracing to the output generated by ocamllex:

    #!/bin/sh
    #
    echo '
        let my_engine a b lexbuf =
            let res = Lexing.engine a b lexbuf in
            Printf.printf "Saw token [%s]'\\\\'n" (Lexing.lexeme lexbuf);
            res
    '
    sed 's/Lexing\.engine/my_engine/g' "$@"
    

    It works like this:

    $ cat ab.mll
    rule token = parse
        [' ' '\t'] { token lexbuf }
      | '\n'       { 1 }
      | '+'        { 2 }
      | _          { 3 }
    {
        let lexbuf = Lexing.from_channel stdin in
        try
            while true do
                ignore (token lexbuf)
            done
        with _ -> exit 0
    }
    $ ocamllex ab.mll
    5 states, 257 transitions, table size 1058 bytes
    $ lext ab.ml > abtraced.ml
    $ ocamlopt -o abtraced abtraced.ml
    $ echo 'a+b' | abtraced
    Saw token []
    Saw token [a]
    Saw token [+]
    Saw token [b]
    Saw token [
    ]
    Saw token []