parsing functional-programming f#lex type-systems

Providing a type system for the parser/lexer specification?

It is often believed that functional programming languages such as Ocaml or F# have a type system allowing us to spend more time in writing code, and less time in debugging, compared with coding in dynamic languages like Python or JavaScript.

Now I am writing a parser specification to be used with FsLexYacc, the F# Lexer and Parser package. The parser specification has something like this, for parsing integers and identifier names:

%token <int> CSTINT
%token <string> NAME
...
Expr: 
  NAME {VAR $1}
 |CSTINT {CSTI $1}

Such kind of code is no more written in function programming languages, and therefore, they are not "protected" by the type system any more. Strange bugs can easily slip in I guess (not quite sure though).

Question: Is there any work (theoretical or practical) that tries to address this issue by providing a kind of type system also for the parser/lexer specification?

Solution

Yes, there has been a certain amount of research into this area.

It should be noted that although the semantic value stack used in LR parsing algorithms has heterogenous types, the algorithm itself is typesafe. But of course there could be mistakes in the implementation of the algorithm, so it would be ideal if the compiler itself could verify that the code produced by the parser generator is correctly typed. That turns out to be possible, and a number of implementations have been produced.

I don't have a complete bibliography handy, but I do have at hand two papers both published in 2006, which seem relevant:

Towards Efficient, Typed LR Parsers by François Pottier and Yann Régis-Gianas:

The LR parser generators that are bundled with many functional programming language implementations produce code that is untyped, needlessly inefficient, or both. We show that, using generalized algebraic data types, it is possible to produce parsers that are well-typed (so they cannot unexpectedly crash or fail) and nevertheless efficient.
Derivation of a Typed Functional LR Parser by Ralf Hinze and Ross Paterson:

This paper describes a purely functional implementation of LR parsing. We formally derive our parsers in a series of steps starting from the inverse of printing. In contrast to traditional implementations of LR parsing, the resulting parsers are fully typed, stackless and table-free. The parsing functions pursue alternatives in parallel with each alternative represented by a continuation argument. The direct implementation presents many opportunities for optimization and initial measurements show excellent performance in comparison with conventional table-driven parsers.