I have a parser and lexer written in ocamlyacc and ocamllex. If the file to parse ends prematurely, as in I forget a semicolon at the end of a line, the application doesn't raise a syntax error. I realize it's because I'm raising and catching EOF and that is making the lexer ignore the unfinished rule, but how should I be doing this to raise a syntax error?
Here is my current parser (simplified),
%{
let parse_error s = Printf.ksprinf failwith "ERROR: %s" s
%}
%token COLON
%token SEPARATOR
%token SEMICOLON
%token <string> FLOAT
%token <string> INT
%token <string> LABEL
%type <Conf.config> command
%start command
%%
command:
| label SEPARATOR data SEMICOLON { Conf.Pair ($1,$3) }
| label SEPARATOR data_list { Conf.List ($1,$3) }
| label SEMICOLON { Conf.Single ($1) }
label :
| LABEL { Conf.Label $1 }
data :
| label { $1 }
| INT { Conf.Integer $1 }
| FLOAT { Conf.Float $1 }
data_list :
| star_data COMMA star_data data_list_ending
{ $1 :: $3 :: $4 }
data_list_ending:
| COMMA star_data data_list_ending { $2 :: $3 }
| SEMICOLON { [] }
and lexxer (simplified),
{
open ConfParser
exception Eof
}
rule token = parse
| ['\t' ' ' '\n' '\010' '\013' '\012']
{ token lexbuf }
| ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n
{ FLOAT n }
| ['0'-'9']+ as n { INT n }
| '#' { comment lexbuf }
| ';' { SEMICOLON }
| ['=' ':'] { SEPARATOR }
| ',' { COMMA }
| ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w
{ LABEL w }
| eof { raise Eof }
and comment = parse
| ['#' '\n'] { token lexbuf }
| _ { comment lexbuf }
example input file,
one = two, three, one-hundred;
single label;
list : command, missing, a, semicolon
One solution, is to add a recursive call in the command rule to itself at the end, and adding an empty rule, all of which build a list to return to the main program. I think I maybe interpreting Eof as a expectation, and ending condition, rather then an error in the lexer, is this correct?
ocamlyacc
does not necessarily consume the whole input. If you want to force it to fail if the whole input is not parse-able, you need to match EOF
in your grammar. Instead of raising Eof
in you lexer, add a token EOF
and change your start
symbol to
%type <Conf.config list> main
main:
EOF { [] }
| command main { $1::$2 }