Search code examples
ocamlcamlp4

Use Camlp4 to parse a string of universal and existential quantifiers


I am using Camlp4 to parse a string of quantifiers in which quantification keywords and variables are separated by comma. One example is like below:

exists x,y,z, forall a,b, exists h,k

Here, exists and forall are keywords, x,y,z,a,b,h,k are identifiers. The corresponding tokens are EXISTS, FORALL and IDENTIFIER of string.

My data structure:

type quantifier =
  | Exists of string
  | Forall of string

To parse the string of quantifiers above, my rules are:

id: [[
  `IDENTIFIER s-> s
]];

one_kind_quantifiers: [[
  `EXISTS; il=LIST1 id SEP `COMMA -> List.map (fun v -> Exists v) il
 |`FORALL; il=LIST1 id SEP `COMMA -> List.map (fun v -> Forall v) il
]];

quantifiers: [[
  t=LIST0 one_kind_quantifiers SEP `COMMA -> List.flatten t
]];

However, my parser always throws an error:

Stream.Error("[id] expected after COMMA (in [one_kind_quantifiers])").

Do you know how to fix this problem? How to make LIST1 stop throwing error when it detects the element after `COMMA is a keyword?

Thank you very much!

(For more information, if I use white-space to separate variables impacted by the same quantification keyword, like exists x y z, forall a b, exists h k. And remove the SEP `COMMA in the one_kind_quantifiers rule, then the parser can perfectly parse this new string).

===========================

Update solution:

With suggestion from Igor (@ygrek), I am able to write the expected parser by not using LIST1 but manually writing rules to parse list of string.

id_list: [[
  `IDENTIFIER s -> [s]
 |t=`id_list; `COMMA; `IDENTIFIER s -> t@[s]
]];

one_kind_quantifiers: [[
  `EXISTS; il=id_list -> List.map (fun v -> Exists v) il
 |`FORALL; il=id_list -> List.map (fun v -> Forall v) il
]];

quantifiers: [[
  t=LIST0 one_kind_quantifiers SEP `COMMA -> List.flatten t
]];

Note that the rule to parse a list of strings is:

id_list: [[
   `IDENTIFIER s -> [s]
 | t=`id_list; `COMMA; `IDENTIFIER s -> t@[s]
]];

but not:

id_list: [[
   `IDENTIFIER s -> [s]
 | `IDENTIFIER s; `COMMA; t=`id_list -> [s]@t
]];

The second way of writing the id_list rule throws the same error as when using LIST1. (So I guess that maybe it is the way that LIST1 is implemented...)


Solution

  • camlp4 is recursive-descent parser and IIRC it will backtrack only on the first token of each rule, once first token matches it will continue till the end of the rule. In this case for LIST1 it can match on comma so it descends but second token is not as expected and it is too late to backtrack. I guess unrolling LIST1 and inlining into your grammar will workaround this issue, but will probably be rather ugly.