I have a handwritten LL1 parser. My AST is not as simplified as it could be. The portion for statements looks like this:
type stmt_opt = StmtExpression of assignment | OptNil
[@@deriving show]
(*stmt_list -> stmt stmt_list | ε *)
type stmtlist =
| StmtList of stmt * stmtlist
| StmtlistNil
[@@deriving show]
and stmt =
| Assignment of assignment
| Return of stmt_opt
| Parentheses of stmtlist
| If of assignment * stmt
| For of assignment * assignment * assignment * stmt
| While of assignment * stmt
(*“lparen” formals_opt “rparen” “LBRACE” vdecl_list stmt_list “RBRACE”*)
[@@deriving show]
As you can see, I'm still holding onto a lot of unnecessary information. I'd like to build my statement ast like this:
type stmt =
Block of stmt list
| Expr of expr
| Return of expr
| If of expr * stmt * stmt
| For of expr * expr * expr * stmt
| While of expr * stmt
I'm a little lost as to do this because I really built my LL1 parser by the books (which I believe don't anticipate very long grammars): every nonterminal has a parse method and every parse method returns a tokenlist and an ast.
I think, in order to build a Block type like in my goal statement AST, I need to build a list of statements inside my recursive parseStmt method. I have reduced my parser code to only the parser methods that call parseStmtList and the specific instances where they call parseStmtList
(*stmt_list = stmt stmt_list | epsilon*)
let rec parseStmtList tokenlist lst =
match tokenlist.head with
| Lexer.RightBrace -> (tokenlist, Ast.StmtlistNil )
| _ -> let (tokenlist_stmt, stmt) = parseStmt tokenlist in
let new_lst = lst::stmt in
let (tokenlist_stmt_list, stmt_list) = tokenlist_stmt new_lst |> parseStmtList in
(tokenlist_stmt_list, Ast.Block(stmt_lst))
(*stmt -> assignment SEMI
| RETURN stmt_opt SEMI
| LBRACE stmt_list RBRACE
| IF LPAREN assignment RPAREN stmt
| FOR LPAREN assignment SEMI assignment SEMI assignment RPAREN stmt
| WHILE LPAREN assignment RPAREN stmt
*)
and parseStmt tokenlist =
begin
match tokenlist.head with
| Lexer.ID identifier -> let (tokenlist_assignment, assignment) = parseAssignment tokenlist in
begin
match tokenlist_assignment.head with
| Lexer.Semicolon -> (next tokenlist_assignment, Ast.Assignment(assignment))
| _-> let err_msg = __LOC__ ^ "Syntax Error semicolon expected but received" ^ show_token_list tokenlist in
raise (Syntax_error err_msg)
end
| Lexer.LeftBrace -> let tokenlist_leftbrace = next tokenlist in
let (tokenlist_expr, expr) = parseStmtList tokenlist_leftbrace [] in
begin
match tokenlist_expr.head with
| Lexer.RightBrace -> (next tokenlist_expr, Ast.Parentheses(expr))
| _-> let err_msg = __LOC__ ^ "Syntax Error right brace expected but received" ^ show_token_list tokenlist in
raise (Syntax_error err_msg)
end
| _-> let err_msg = __LOC__ ^ "Syntax Error left brace expected but received" ^ show_token_list tokenlist in
raise (Syntax_error err_msg)
end
However, I'm getting the error:
Error: This expression has type 'a -> token_list * Ast.stmtlist
but an expression was expected of type 'b * 'c
for the line let (tokenlist_stmt_list, stmt_list) = tokenlist_stmt new_lst |> parseStmtList in
in parseStmtList
tokenlist_stmt new_lst |> parseStmtList
Here you're applying tokenlist_stmt
to the argument new_lst
and then applying parseStmtList
to the result. But tokenlist_stmt
isn't actually a function, so this a type error.
Presumably your intention was to call parseStmtList
with tokenlist_stmt
and new_lst
as its two argument. The syntax for that is simply:
parseStmtList tokenlist_stmt new_lst
Further lst::stmt
is also a type error for two reasons:
::
takes the list as its right operand, not the left, so it should be stmt::lst
lst
isn't actually a list, it's an Ast.Block
because that's what parseStmtList
returns.Once you've fixed all that, you'll notice that the list will be the wrong way around (presumably that's why you tried lst::stmt
in the first place, but you can't append to the end of a list like that). That's a common problem when building a list with an accumulator. The solution is to either reverse the list once you're done building it or to not use an accumulator in the first place.
One thing that's important to point out is that all of those issues would have applied when using Ast.stmtlist
as well. That is, if your code looked like this:
let new_lst = Ast.StmtList(lst, stmt) in
let (tokenlist_stmt_list, stmt_list) = tokenlist_stmt new_lst |> parseStmtList in
(tokenlist_stmt_list, Ast.Block(stmt_lst))
Then you'd get the exact same errors. This makes me think, that you've changed way more of the code than you would have needed to. Since your old code presumably worked, I'll assume that it looked something like this:
let rec parseStmtList tokenlist =
match tokenlist.head with
| Lexer.RightBrace -> (tokenlist, Ast.StmtlistNil )
| _ -> let (tokenlist_stmt, stmt) = parseStmt tokenlist in
let (tokenlist_stmt_list, stmt_list) = parseStmtList tokenlist_stmt in
(tokenlist_stmt_list, Ast.StmtList (stmt, stmt_lst))
and then in parseStmt
you had:
let (tokenlist_stmtlist, stmtlist) = parseStmtList tokenlist_leftbrace in
begin
match tokenlist_expr.head with
| Lexer.RightBrace -> (next tokenlist_stmtlist, Ast.Block(stmtlist))
Now after removing Ast.stmtlist
, all you need to change are the parts where you actually used its constructors and replace those parts with the list constructors (::
and []
). So the code in parseStmt
would stay completely unchanged and the only changes in parseStmtList
should be to replace the line
| Lexer.RightBrace -> (tokenlist, Ast.StmtlistNil )
with
| Lexer.RightBrace -> (tokenlist, [] )
and the line
(tokenlist_stmt_list, Ast.StmtList (stmt, stmt_lst))
with
(tokenlist_stmt_list, stmt :: stmt_lst)
If your old code looked differently than what I came up with above, you might have to change different lines, but the idea remains the same: Replace Ast.StmtList
with ::
and Ast.StmtListNil
with []
.
And that's it. That's all the change that's necessary. You were way overcomplicating it.