This grammar for ANTLR4 should break a document up into two types of substring: wiki and nowiki.
grammar NoWikiText;
nowiki: '<nowiki>' ~'</nowiki>'* '</nowiki>';
wiki: ~'<nowiki>'+;
document: (wiki | nowiki)*;
Here's the input:
<nowiki>2</nowiki>4<nowiki></nowiki>
I get two matches for nowiki. But the text "4", which should match wiki, is ignored. Why?
EDIT:
This seems to work:
grammar NoWikiText;
P1: '<nowiki>';
P2: '</nowiki>';
NP: .;
nowiki: P1 NP* P2;
wiki: NP+;
document: (wiki | nowiki)*;
In the grammar you posted, only 2 tokens will be created: <nowiki>
and </nowiki>
. The negations char works differently than you expect: ~'</nowiki>'
means: "match any token other than </nowiki>
" (so that would match the token <nowiki>
). So for your input <nowiki>2</nowiki>4<nowiki></nowiki>
, the 2
and 4
are not recognized as valid tokens.