Search code examples
language-agnosticparser-generator

Scannerless Parser Generators


Prologue: Although the set of languages recognized by parsers (context-free grammars) is strictly larger than the one of scanners (regular grammars), most parser generators need a scanner.

(Please don't try to explain the reasons behind it, I know them quite well).

I've seen parsers, that do not require a scanner like

There are certain advantages of using no scanner:

  • Just one concept (context-free grammars) instead of two
  • Parse multiple languages in one file (like HTML/Javascript)
  • Parse languages without reserved keywords (like PL/1)

Often, "workarounds" are used like switching the scanner on the parser's request.

Question: Do you know any other scannerless parser generators (any language)? Are these practical in use (or purely academic)? Are there any other approaches than Tomita/GLR?

Answers:


Solution

  • Two others:

    • Bryan Ford's Parsing Expression Grammars (PEG) require no scanner. Efficient, lazy "packrat parser" is optional. I've had nothing but good experience with the Lua LPEG version, which compiles to an efficient bytecode machine. Quite practical.

    • YAKKER looks very intriguing although it is still clearly in a pre-release state. They are using what they claim to be an efficient variation on Earley's parsing algorithm.

    I'm actually a huge fan of scannerless parsers; they simplify configuration enormously. And typical scanner generators, to put it mildly, are not much fun to use. From the man page for Lex:

    The asteroid to kill this dinosaur is still in orbit.

    Finally, I have no personal experience with Elkhound, but the secondhand reports I hear are impressive. I would say there's no question but that some scannerless parser generators are very practical.