Search code examples
frontendlexcode-formattingpretty-printocamllex

Common way to store positions in a formatter


I want to write a small editor for a specific language. In the editor, we will be able to indent one or several lines (ie, adding white-spaces on the left hand of each line); we will also be able to format the whole code (ie, alter white-spaces and newlines in appropriate locations).

Given a program, my front-end by ocamllex and ocamlyacc could build a Abstract Syntax Tree (AST). I would like to know what are common ways to store positions of elements in AST.

One way I guess is to append a (start) position to each element of the AST. For example, if the type of an expression is defined as follow:

type expression =
  ...
  | E_int of int
  | E_function_EEs of Function.t * (expression list)

It will become:

type expression =
  ...
  | E_int of position * int
  | E_function_EEs of position * Function.t * (expression list)

Then, if we know the length of each element, we can infer the position of everything in the editor. Is it a common way to do so? I don't find it nice...


Solution

  • You don't have to repeat position for each pattern. You could just write one in the end:

    type expression = 
      ...
      | E_int of int
      | E_function_EEs of Function.t * (expression list)
      | E_loc of position * expression
    

    As a result, for existing functions over expression, you could just add a case for E_loc, and do not need to touch existing cases.

    To construct E_loc automatically while parsing, you could add in .mly for example:

    loc(EXPRESSION):
    | t = EXPRESSION { E_loc (($startpos, $endpos), t) }
    
    (* immediate construction: *)
    expression:
    | INT { E_loc (($startpos, $endpos), E_int $1) }
    
    (* or delay construction: *)
    expression:
    | e0 = loc(expression) PLUS e1 = loc(expression) { E_function_EEs (Function.PLUS, [e0; e1]) }