I have just started to learn SML and I am feeling confused by the following code:
type pos = int
datatype prog= Prog of statement option
and statement =... some code..
and expression =
VarExp of symbol * pos
| IntExp of int * pos
| BoolExp of bool * pos
| BopExp of exp * oper * exp * pos
| UopExp of oper * exp * pos
What I am not able to understand is the use of pos
. Shouldn't the signature be like BopExp of exp * oper * exp
instead of BopExp of exp * oper * exp * pos
?
Since John, molbdnilo and Tai already answered you in comments by saying that pos is probably some kind of position, suggesting that the textual representation of the syntactic element resides at the pos'th character in a file, here's some more thoughts:
A syntax tree might be annotated with all kinds of information. The parser usually keeps the position for error reporting. So does a static type-checker, but it might also keep inferred type information that it can pass onto itself or a subsequent code-generating phase.
What you can do is parameterise your syntax tree:
datatype 'a prog = Prog of 'a stmt list
and 'a stmt = AssignStmt of symbol * 'a exp * 'a
| PrintStmt of 'a exp * 'a
and 'a exp = VarExp of symbol * 'a
| IntExp of int * 'a
| BoolExp of bool * 'a
| BopExp of 'a exp * oper * 'a exp * 'a
| UopExp of oper * 'a exp * 'a
Then a program like x := 2 + true; might first be parsed and annotated as a pos prog:
type pos = int
val hello = Prog [AssignStmt ("x", BopExp (IntExp (2, 6), "+",
BoolExp (true, 10),
6),
0)
]
which says that the assignment is found at position 0, the expression 2 + true is found at position 6, the integer 2 at position 6, and the boolean true at position 10. When you get to type-checking, you might instead want a (pos × typ) prog:
type typ = Int | Bool | None | Conflict of typ list * typ
val hello = Prog [AssignStmt ("x", BopExp (IntExp (2, (6, Int)), "+",
BoolExp (true, (10, Bool)),
(6, Conflict ([Int, Bool], Int)),
(0, None)
]
which retains the position information for decent error reporting (that's still necessary if there's a type error), but also keeps information about where there are conflicting types, e.g. Conflict ([Int, Bool], Int)
was meant to suggest that while it expects Int
, it can't derive that because of conflicting types Int
and Bool
.
Then you only ever need one syntax tree definition for positions, types, registers, whatever you can come up with to annotate your syntactic elements with.