Search code examples
functional-programmingsmlsmlnjml

Meaning of 'pos' in syntax tree


I have just started to learn SML and I am feeling confused by the following code:

type pos = int
datatype prog= Prog of statement option
    and statement =... some code..
    and expression =
         VarExp of symbol * pos
         | IntExp of int * pos
         | BoolExp of bool * pos
         | BopExp of exp * oper * exp * pos
         | UopExp of oper * exp * pos

What I am not able to understand is the use of pos. Shouldn't the signature be like BopExp of exp * oper * exp instead of BopExp of exp * oper * exp * pos ?


Solution

  • Since John, molbdnilo and Tai already answered you in comments by saying that pos is probably some kind of position, suggesting that the textual representation of the syntactic element resides at the pos'th character in a file, here's some more thoughts:

    A syntax tree might be annotated with all kinds of information. The parser usually keeps the position for error reporting. So does a static type-checker, but it might also keep inferred type information that it can pass onto itself or a subsequent code-generating phase.

    What you can do is parameterise your syntax tree:

    datatype 'a prog = Prog of 'a stmt list
    and 'a stmt = AssignStmt of symbol * 'a exp * 'a
                | PrintStmt of 'a exp * 'a
    and 'a exp = VarExp of symbol * 'a
               | IntExp of int * 'a
               | BoolExp of bool * 'a
               | BopExp of 'a exp * oper * 'a exp * 'a
               | UopExp of oper * 'a exp * 'a
    

    Then a program like x := 2 + true; might first be parsed and annotated as a pos prog:

    type pos = int
    val hello = Prog [AssignStmt ("x", BopExp (IntExp (2, 6), "+",
                                               BoolExp (true, 10),
                                               6),
                                  0)
                     ]
    

    which says that the assignment is found at position 0, the expression 2 + true is found at position 6, the integer 2 at position 6, and the boolean true at position 10. When you get to type-checking, you might instead want a (pos × typ) prog:

    type typ = Int | Bool | None | Conflict of typ list * typ
    val hello = Prog [AssignStmt ("x", BopExp (IntExp (2, (6, Int)), "+",
                                               BoolExp (true, (10, Bool)),
                                               (6, Conflict ([Int, Bool], Int)),
                                  (0, None)
                     ]
    

    which retains the position information for decent error reporting (that's still necessary if there's a type error), but also keeps information about where there are conflicting types, e.g. Conflict ([Int, Bool], Int) was meant to suggest that while it expects Int, it can't derive that because of conflicting types Int and Bool.

    Then you only ever need one syntax tree definition for positions, types, registers, whatever you can come up with to annotate your syntactic elements with.