Search code examples
antlrantlr4

Antlr4 using fragments to exclude characters


I have a working TOKEN that excludes certain characters. It must not start with + or -, but these characters are allowed after the start.

TOKEN : ~('+' | '-' | '\u0000' .. '\u001f' | ' ' | '<' | '>' | ':' | '"' | '/' | '\\' | '|' | '?' | '*' | '#' | '@') ~('\u0000' .. '\u001f' | ' ' | '<' | '>' | ':' | '"' | '/' | '\\' | '|' | '?' | '*' | '#' | '@')+ ;

I have been trying to simplify it using fragments...

fragment EXCLUDED : ('\u0000' .. '\u001f' | ' ' | '<' | '>' | ':' | '"' | '/' | '\\' | '|' | '?' | '*' | '#' | '@');
fragment RESERVED : ('+' | '-') ;
TOKEN : ~(RESERVED | EXCLUDED) ~(EXCLUDED)+ ;

However I get the error: rule reference RESERVED is not currently supported in a set?


Solution

  • If you use the shorter character set notation from ANTLR 4, you perhaps don't need to use the negated fragments. The rule:

    TOKEN
     : ~('+' | '-' | '\u0000' .. '\u001f' | ' ' | '<' | '>' | ':' | '"' | '/' | '\\' | '|' | '?' | '*' | '#' | '@') ~('\u0000' .. '\u001f' | ' ' | '<' | '>' | ':' | '"' | '/' | '\\' | '|' | '?' | '*' | '#' | '@')+
     ;
    

    is the same as this:

    TOKEN
     : ~[+\-\u0000-\u001f <>:"/\\|?*#@] ~[\u0000-\u001f <>:"/\\|?*#@]+
     ;