Search code examples
treeocamlgrammarocamllexmenhir

Example of grammar (lex/yacc) for tree description


I want to parse a tree from a file that will describe this tree (which actually is a taxonomy).

I am looking for examples of grammar (ideally lex/yacc files) that provides description of trees. It would be better if that described trees were not binary search trees but instead trees where each node (possibly) has several children (is it called a family tree? planar tree?).

Ideally, it would be perfect if this lex/yacc would actually be included in an OCaml library. But any good grammar for tree description will satisfy me.

I tried to find examples through Google or Stackoverflow but research results are overwhelmed by parse tree-related questions. I could make a grammar myself but I would like to see example first in order to do have a good starting point.


Solution

  • Here is my attempt at creating a minimal example of parsing a tree:

    I assume trees are represented as name_of_the_node(child(...), other_child(...), ...). For instance, here is a simple tree with a root and 3 leaves: root(first_leaf(), second_leaf(), third_leaf()).

    lexer.mll

    {
      open Parser
      open Lexing
    
      exception Bad_char of char
    }
    
    rule main = parse
    | ' ' | '\t' | '\n' { main lexbuf }
    | ',' { COMMA }
    | '(' { LP }
    | ')' { RP }
    | ['a'-'z' '_']+ as s { IDENT s }
    | _ as c { raise (Bad_char c) }
    

    parser.mly

    %{
      open Tree
    %}
    
    %token <string> IDENT
    %token COMMA LP RP
    
    %start <Tree.t> tree
    
    %%
    
    tree:
    label = IDENT LP children = separated_list(COMMA, tree) RP { T(label, children) }
    

    tree.ml

    type t = T of string * t list
    

    to compile:

    ocamllex lexer.mll
    ocamlc -c tree.ml
    menhir --infer -v parser.mly
    ocamlc -c parser.mli
    ocamlc -c parser.ml
    ocamlc -c lexer.ml
    

    to test into toplevel:

    ocaml tree.cmo parser.cmo lexer.cmo
    

    and then:

    let tree_of_string s = Parser.tree Lexer.main (Lexing.from_string s);;
    tree_of_string "toto (titi(), tata(tutu()))";;