Search code examples
ocamlocamlyaccmenhir

Get the input string that raises parsing error inside the parser


I have a frontend written in menhir which tries to parse an expression: from a string to an expression AST. The entry point of the frontend Parser_e.main is called in several different places in my OCaml code. So I would like to be able to catch possible errors inside the frontend rather than outside. When catching an error, a particular important information I want to show is the entire input string that the frontend cannot parse. (Errors from the lexer are very rare, because the frontend can almost read everything).

So I tried to follow this thread, and to print more information when there is an error. In parser_e.mly, I have added

exception LexErr of string
exception ParseErr of string

let error msg start finish  = 
  Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum 
       (start.pos_cnum - start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg

let parse_error msg nterm =
  raise (ParseErr (error msg (rhs_start_pos nterm) (rhs_end_pos nterm)))

e_expression:
/* empty */ { EE_empty }
| INTEGER { EE_integer $1 }
| DOUBLE { EE_double $1 }
...
| error { parse_error "e_expression" 1; ERR "" }

But it still does not have the input string as information. Does anyone if there is any function I am missing to get that?


Solution

  • In the context of an error you can extract a location of failed lexeme in a format of two positions, using Parsing.symbol_start_pos and Parsing.symbol_end_pos functions. Unfortunately Parsing module doesn't really provide an access to the lexeme as a string, but if the input was stored in file then it is possible to extract it manually or print an error in a compiler style, that a descent IDE will understand and highlight it manually. A module Parser_error is below. It defines function Parser_error.throw that will raise an Parser_error.T exception. The exception caries a diagnostic message and a position of a failed lexeme. Several handy functions are provided to extract this lexeme from a file, or to generate a fileposition message. If your input is not stored in a file, then you can use string_of_exn function that accepts the input as a string and the Parser_error.T exception, and extracts the offending substring from it. This is an example of a parser that uses this exception for error reporting.

    open Lexing
    
    (** T(message,start,finish) parser failed with a [message] on an 
        input specified by [start] and [finish] position.*)
    exception T of (string * position * position)
    
    (** [throw msg] raise a [Parser_error.T] exception with corresponding
        message. Must be called in a semantic action of a production rule *)
    let throw my_unique_msg =
      let check_pos f = try f () with _ -> dummy_pos in
      Printexc.(print_raw_backtrace stderr (get_raw_backtrace ()));
      let sp = check_pos Parsing.symbol_start_pos in
      let ep = check_pos Parsing.symbol_end_pos  in
      raise (T (my_unique_msg,sp,ep))
    
    (** [fileposition start finish] creates a string describing a position 
        of an lexeme specified by [start] and [finish] file positions. The
        message has the same format as OCaml and GNU compilers, so it is
        recognized by most IDE, e.g., Emacs. *)
    let fileposition err_s err_e =
      Printf.sprintf
        "\nFile \"%s\", line %d, at character %d-%d\n"
        err_s.pos_fname err_s.pos_lnum err_s.pos_cnum err_e.pos_cnum
    
    (** [string_of_exn line exn] given a [line] in a file, extract a failed 
        lexeme form the exception [exn] and create a string denoting the  
        parsing error in a format similar to the format used by OCaml 
        compiler, i.e., with fancy underlying. *) 
    let string_of_exn line (msg,err_s,err_e) =
      let b = Buffer.create 42 in
      if err_s.pos_fname <> "" then
        Buffer.add_string b (fileposition err_s err_e);
      Buffer.add_string b
        (Printf.sprintf "Parse error: %s\n%s\n" msg line);
      let start = max 0 (err_s.pos_cnum - err_s.pos_bol)  in
      for i=1 to start  do
        Buffer.add_char b ' '
      done;
      let diff = max 1 (err_e.pos_cnum - err_s.pos_cnum) in
      for i=1 to diff do
        Buffer.add_char b '^'
      done;
      Buffer.contents b
    
    (** [extract_line err] a helper function that will extract a line from 
         a file designated by the parsing error exception *)
    let extract_line err =
      let line = ref "" in
      try
        let ic = open_in err.pos_fname in
        for i=0 to max 0 (err.pos_lnum - 1) do
          line := input_line ic
        done;
        close_in ic;
        !line
      with exn -> !line
    
    (** [to_string exn] converts an exception to a string *)
    let to_string ((msg,err,_) as exn) =
      let line = extract_line err in
      string_of_exn line exn
    

    Here is an example, that shows how to use in case if there is no file, and input is from a stream or interactive (shell-like) source:

    let parse_command line =
      try
        let lbuf = Lexing.from_string line in
        `Ok Parser.statement Lexer.tokens lbuf
      with
      | Parsing.Parse_error -> `Fail "Parse error"
      | Parser_error.T exn -> `Fail (Parser_error.string_of_exn line exn)