I'm currently working on a programming language as a hobby. It would make lexing errors massively easier to debug if it was possible to have ocamllex print out the tokens it matches as it finds them, I occasionally just add print statements to my rules manually but there should be an easier way to do that.
So what I'm asking is, given a .mll file and some input, is there an automatic way to see the corresponding tokens?
I don't think there is a built-in way to ask the lexer to print its tokens.
If you use ocamlyacc, you can set the p
option in OCAMLRUNPARAM
to see a trace of the parser's actions. This is described in Section 12.5 of the OCaml manual. See Section 10.2 for a description of OCAMLRUNPARAM
.
If you don't mind a crude hack, I just wrote a small script lext
that adds tracing to the output generated by ocamllex:
#!/bin/sh
#
echo '
let my_engine a b lexbuf =
let res = Lexing.engine a b lexbuf in
Printf.printf "Saw token [%s]'\\\\'n" (Lexing.lexeme lexbuf);
res
'
sed 's/Lexing\.engine/my_engine/g' "$@"
It works like this:
$ cat ab.mll
rule token = parse
[' ' '\t'] { token lexbuf }
| '\n' { 1 }
| '+' { 2 }
| _ { 3 }
{
let lexbuf = Lexing.from_channel stdin in
try
while true do
ignore (token lexbuf)
done
with _ -> exit 0
}
$ ocamllex ab.mll
5 states, 257 transitions, table size 1058 bytes
$ lext ab.ml > abtraced.ml
$ ocamlopt -o abtraced abtraced.ml
$ echo 'a+b' | abtraced
Saw token []
Saw token [a]
Saw token [+]
Saw token [b]
Saw token [
]
Saw token []