Search code examples
eclipseparsinggrammarcoffeescriptxtext

Xtext: grammar for language with significant/semantic whitespace


How can I use Xtext to parse languages with semantic whitespace? I'm trying to write a grammar for CoffeeScript and I can't find any good documentation on this.


Solution

  • AFAIK, you can't.

    In case of parsing Python-like languages, you'd need the lexer to emit INDENT and DEDENT tokens. For that to happen, you'd need semantic predicates to be supported inside lexer rules (Xtext's terminal rules) that would first check if the current-position-in-line of the next character int the input equals 0 (the beginning of the line) and is a ' ' or '\t'.

    But browsing through the documentation, I don't see this is supported by Xtext at the moment. Since Xtext 2.0, support has been added for semantic predicates in production rules (see: 6.2.8. Syntactic Predicates), but not in terminal rules.

    The only way to do this with Xtext would be to let the lexer produce terminal spaces and line-breaks, but this would make an utter mess of your production rules.

    If you want to parse such a language using Java (and a Java oriented parser generator) I'd recommend ANTLR, in which you can emit such INDENT and DEDENT tokens quite easily. But if you're keen on Eclipse integration, then I don't see how you'd be able to do this using Xtext, sorry.