In my input, a line start with *
is a comment line unless it starts with *+
or *-
. I can ignore the comments but need to get the others.
This is my lexer rules:
WhiteSpaces : [ \t]+;
Newlines : [\r\n]+;
Commnent : '*' .*? Newlines -> skip ;
SkipTokens : (WhiteSpaces | Newlines) -> skip;
An example:
* this is a comment line
** another comment line
*+ type value
So, the first two are comment lines, and I can skip it. But I don't know to to define lexer/parser rule that can catch the last line.
Your SkipTokens
lexer rule will never be matched because the rules WhiteSpaces
and Newlines
are placed before it. See this Q&A for an explanation how the lexer matches tokens: ANTLR Lexer rule only seems to work as part of parser rule, and not part of another lexer rule
For it to work as you expect, do this:
SkipTokens : (WhiteSpaces | Newlines) -> skip;
fragment WhiteSpaces : [ \t]+;
fragment Newlines : [\r\n]+;
What a fragment
is, check this Q&A: What does "fragment" mean in ANTLR?
Now, for your question. You defined a Comment
rule to always end with a line break. This means that there can't be a comment at the end of your input. So you should let a comment either end with a line break or the EOF.
Something like this should do the trick:
COMMENT
: '*' ~[+\-\r\n] ~[\r\n]* // a '*' must be followed by something other than '+', '-' or a line break
| '*' ( [\r\n]+ | EOF ) // a '*' is a valid comment if directly followed by a line break, or the EOF
;
STAR_MINUS
: '*-'
;
STAR_PLUS
: '*+'
;
SPACES
: [ \t\r\n]+ -> skip
;
This, of course, does not mandate the *
to be at the start of the line. If you want that, checkout this Q&A: Handle strings starting with whitespaces