Search code examples
parsingantlrantlr4

How to make an ANTLR grammar that matches strings both inside and outside a delimiter?


This grammar for ANTLR4 should break a document up into two types of substring: wiki and nowiki.

grammar NoWikiText;

nowiki: '<nowiki>' ~'</nowiki>'* '</nowiki>';
wiki: ~'<nowiki>'+;
document: (wiki | nowiki)*;

Here's the input:

<nowiki>2</nowiki>4<nowiki></nowiki>

I get two matches for nowiki. But the text "4", which should match wiki, is ignored. Why?

EDIT:

This seems to work:

grammar NoWikiText;

P1: '<nowiki>';
P2: '</nowiki>';
NP: .;

nowiki: P1 NP* P2;
wiki: NP+;
document: (wiki | nowiki)*;

Solution

  • In the grammar you posted, only 2 tokens will be created: <nowiki> and </nowiki>. The negations char works differently than you expect: ~'</nowiki>' means: "match any token other than </nowiki>" (so that would match the token <nowiki>). So for your input <nowiki>2</nowiki>4<nowiki></nowiki>, the 2 and 4 are not recognized as valid tokens.