Search code examples
parsingocamlrecursive-descent

Ocaml: build list for Ast type in recursive function


I have a handwritten LL1 parser. My AST is not as simplified as it could be. The portion for statements looks like this:

type stmt_opt = StmtExpression of assignment | OptNil
[@@deriving show]
(*stmt_list -> stmt stmt_list | ε *)
type stmtlist =
    | StmtList of stmt * stmtlist
    | StmtlistNil 
[@@deriving show]

and stmt = 
| Assignment of assignment
| Return of stmt_opt 
| Parentheses of stmtlist
| If of assignment * stmt
| For of assignment * assignment * assignment * stmt
| While of assignment * stmt
(*“lparen” formals_opt “rparen” “LBRACE” vdecl_list stmt_list “RBRACE”*)
[@@deriving show]

As you can see, I'm still holding onto a lot of unnecessary information. I'd like to build my statement ast like this:

type stmt =
    Block of stmt list
  | Expr of expr
  | Return of expr
  | If of expr * stmt * stmt
  | For of expr * expr * expr * stmt
  | While of expr * stmt

I'm a little lost as to do this because I really built my LL1 parser by the books (which I believe don't anticipate very long grammars): every nonterminal has a parse method and every parse method returns a tokenlist and an ast.

I think, in order to build a Block type like in my goal statement AST, I need to build a list of statements inside my recursive parseStmt method. I have reduced my parser code to only the parser methods that call parseStmtList and the specific instances where they call parseStmtList

(*stmt_list = stmt stmt_list | epsilon*)
let rec parseStmtList tokenlist lst = 
    match tokenlist.head with 
    | Lexer.RightBrace -> (tokenlist, Ast.StmtlistNil )
    | _ -> let (tokenlist_stmt, stmt) = parseStmt tokenlist in 
          let new_lst = lst::stmt in
          let (tokenlist_stmt_list, stmt_list) = tokenlist_stmt new_lst |> parseStmtList in
          (tokenlist_stmt_list, Ast.Block(stmt_lst))

(*stmt -> assignment SEMI 
|  RETURN stmt_opt SEMI
|  LBRACE stmt_list RBRACE 
|  IF LPAREN assignment RPAREN stmt 
|  FOR LPAREN assignment SEMI assignment SEMI assignment RPAREN stmt  
|  WHILE LPAREN assignment RPAREN stmt
*)

and parseStmt tokenlist = 
   begin
   match tokenlist.head with 
   | Lexer.ID identifier -> let (tokenlist_assignment, assignment) = parseAssignment tokenlist in
                begin
                match tokenlist_assignment.head with
                | Lexer.Semicolon -> (next tokenlist_assignment, Ast.Assignment(assignment))
                | _-> let err_msg = __LOC__ ^ "Syntax Error semicolon expected but received" ^ show_token_list tokenlist in
                     raise (Syntax_error err_msg) 
                end
         | Lexer.LeftBrace -> let tokenlist_leftbrace = next tokenlist in 
                        let (tokenlist_expr, expr) = parseStmtList tokenlist_leftbrace [] in
                        begin
                        match tokenlist_expr.head with
                        | Lexer.RightBrace -> (next tokenlist_expr, Ast.Parentheses(expr))
                        | _-> let err_msg = __LOC__ ^ "Syntax Error right brace expected but received" ^ show_token_list tokenlist in
                              raise (Syntax_error err_msg)
                        end
   | _-> let err_msg = __LOC__ ^ "Syntax Error left brace expected but received" ^ show_token_list tokenlist in
                    raise (Syntax_error err_msg)
    end

However, I'm getting the error:

Error: This expression has type 'a -> token_list * Ast.stmtlist
       but an expression was expected of type 'b * 'c

for the line let (tokenlist_stmt_list, stmt_list) = tokenlist_stmt new_lst |> parseStmtList in in parseStmtList


Solution

  • tokenlist_stmt new_lst |> parseStmtList
    

    Here you're applying tokenlist_stmt to the argument new_lst and then applying parseStmtList to the result. But tokenlist_stmt isn't actually a function, so this a type error.

    Presumably your intention was to call parseStmtList with tokenlist_stmt and new_lst as its two argument. The syntax for that is simply:

    parseStmtList tokenlist_stmt new_lst
    

    Further lst::stmt is also a type error for two reasons:

    1. :: takes the list as its right operand, not the left, so it should be stmt::lst
    2. lst isn't actually a list, it's an Ast.Block because that's what parseStmtList returns.

    Once you've fixed all that, you'll notice that the list will be the wrong way around (presumably that's why you tried lst::stmt in the first place, but you can't append to the end of a list like that). That's a common problem when building a list with an accumulator. The solution is to either reverse the list once you're done building it or to not use an accumulator in the first place.


    One thing that's important to point out is that all of those issues would have applied when using Ast.stmtlist as well. That is, if your code looked like this:

    let new_lst = Ast.StmtList(lst, stmt) in
    let (tokenlist_stmt_list, stmt_list) = tokenlist_stmt new_lst |> parseStmtList in
    (tokenlist_stmt_list, Ast.Block(stmt_lst))
    

    Then you'd get the exact same errors. This makes me think, that you've changed way more of the code than you would have needed to. Since your old code presumably worked, I'll assume that it looked something like this:

    let rec parseStmtList tokenlist = 
        match tokenlist.head with 
        | Lexer.RightBrace -> (tokenlist, Ast.StmtlistNil )
        | _ -> let (tokenlist_stmt, stmt) = parseStmt tokenlist in 
              let (tokenlist_stmt_list, stmt_list) = parseStmtList tokenlist_stmt in
              (tokenlist_stmt_list, Ast.StmtList (stmt, stmt_lst))
    

    and then in parseStmt you had:

    let (tokenlist_stmtlist, stmtlist) = parseStmtList tokenlist_leftbrace in
    begin
      match tokenlist_expr.head with
      | Lexer.RightBrace -> (next tokenlist_stmtlist, Ast.Block(stmtlist))
    

    Now after removing Ast.stmtlist, all you need to change are the parts where you actually used its constructors and replace those parts with the list constructors (:: and []). So the code in parseStmt would stay completely unchanged and the only changes in parseStmtList should be to replace the line

    | Lexer.RightBrace -> (tokenlist, Ast.StmtlistNil )
    

    with

    | Lexer.RightBrace -> (tokenlist, [] )
    

    and the line

    (tokenlist_stmt_list, Ast.StmtList (stmt, stmt_lst))
    

    with

    (tokenlist_stmt_list, stmt :: stmt_lst)
    

    If your old code looked differently than what I came up with above, you might have to change different lines, but the idea remains the same: Replace Ast.StmtList with :: and Ast.StmtListNil with [].

    And that's it. That's all the change that's necessary. You were way overcomplicating it.