Search code examples
streamocamlcamlp4

Ocaml lexer / parser rules


I wrote a program in ocaml that given an infix expression like 1 + 2, outputs the prefix notation : + 1 2

My problem is I don't find a way to make a rules like : all value, operator and bracket should be always separated by at least one space: 1+ 1 would be wrong 1 + 1 ok. I would like to not use the ocamlp4 grammar.

here is the code:

open Genlex                                                                                                                                                               

type tree =
  | Leaf of string
  | Node of tree * string * tree

let my_lexer str =
  let kwds = ["("; ")"; "+"; "-"; "*"; "/"] in
    make_lexer kwds (Stream.of_string str)

let make_tree_from_stream stream =
  let op_parser operator_l higher_perm =
    let rec aux left higher_perm = parser
        [<'Kwd op when List.mem op operator_l; right = higher_perm; s >]
        -> aux (Node (left, op, right)) higher_perm s
      | [< >]
        -> left
    in
      parser [< left = higher_perm; s >]        -> aux left higher_perm s
  in
  let rec high_perm l = op_parser ["*"; "/"] brackets l
  and low_perm l = op_parser ["+"; "-"] high_perm l
  and brackets = parser
    | [< 'Kwd "("; e = low_perm; 'Kwd ")" >]    -> e
    | [< 'Ident n >]                            -> Leaf n
    | [< 'Int n >]                              -> Leaf (string_of_int n)
  in
    low_perm stream

let rec draw_tree = function
  | Leaf n              -> Printf.printf "%s" n
  | Node(fg, r, fd)     -> Printf.printf "(%s " (r);
      draw_tree fg;
      Printf.printf " ";
      draw_tree fd;
      Printf.printf ")"

let () =
  let line = read_line() in
    draw_tree (make_tree_from_stream (my_lexer line)); Printf.printf "\n"

Plus if you have some tips about the code or if you notice some error of prog style then I will appreciate that you let it me know. Thanks !


Solution

  • The Genlex provides a ready-made lexer that respects OCaml's lexical convention, and in particular ignore the spaces in the positions you mention. I don't think you can implement what you want on top of it (it is not designed as a flexible solution, but a quick way to get a prototype working).

    If you want to keep writing stream parsers, you could write your own lexer for it: define a token type, and lex a char Stream.t into a token Stream.t, which you can then parse as you wish. Otherwise, if you don't want to use Camlp4, you may want to try an LR parser generator, such as menhir (a better ocamlyacc).