parsing lexical-analysis ocamllex ocamlyacc menhir

Conflict while parsing a set of expressions

I would like to parse a set of expressions: R[3]C, R[2]C, R[3]C-R[2]C... There is a conflict I cannot solve...

Here is a part of lexer.mll:

  rule token = parse
  | 'R'            { R }
  | 'C'            { C }
  | "RC"           { RC }
  | ['0'-'9']+ as lxm { INTEGER (int_of_string lxm) }
  | '+'            { PLUS }
  | '-'            { MINUS }
  | '['            { LBRACKET }
  | ']'            { RBRACKET }
  | eof            { EOF }
  ...

A part of parser.mly:

main:
   e_expression EOF                { $1 };

e_expression:
| ec = e_cell { EE_rc (Rc.Cell ec) }
| e_expression MINUS e_expression { EE_string_EEL ("MINUS", [$1; $3]) }

e_cell:
| R LBRACKET r = index RBRACKET C c = index { (Rc.I_relative r, Rc.I_absolute c) }
| R LBRACKET r = index RBRACKET C { (Rc.I_relative r, Rc.I_relative 0) }

index:
| INTEGER { $1 }
| MINUS INTEGER { Printf.printf "%n\n" 8; 0 - $2 }

This code curiously does not work with R[3]C-R[2]C, here is parser.conflicts, that I can not really understand.

If I comment the line line | R LBRACKET r = index RBRACKET C c = index ... in e_cell, the code can parse R[3]C-R[2]C, where 3 and 2 are index, `R[3]C and R[2]C are e_cell, and R[3]C-R[2]C is e_expression.

Could anyone help?

Solution

So the problem seems to be that when it sees a "-" token after a ], the parser is unsure whether it is making an index, or if it is separating two expressions.

i.e. When the parser reaches R[3]C-, it isn't sure whether it needs to wait for an INTEGER to complete the e_cell and reduce, or reduce now and start work on another e_expression.

The best way to solve this would probably be to move the negative integer code into the lexer. I don't have an ocamllex install handy, but I think changing

['0'-'9']+

'-'? ['0'-'9']+

would work, and then remove the negative integer case from index (obviously this would cause a problem with the Printf statement, but you can make the internal logic more complex to account for this.