I have following grammar:
lexer grammar TestLexer;
Number
: '-'? [0-9]+
;
Punctuation
: [\-.]
;
Identifier
: '.'? [a-zA-Z]+
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
and following input file:
1-2
1 -2
.foo
foo.bar
This produces
[@0,0:0='1',<Number>,1:0]
[@1,1:2='-2',<Number>,1:1]
[@2,5:5='1',<Number>,2:0]
[@3,7:8='-2',<Number>,2:2]
[@4,13:16='.foo',<Identifier>,4:0]
[@5,19:21='foo',<Identifier>,5:0]
[@6,22:25='.bar',<Identifier>,5:3]
[@7,28:27='<EOF>',<EOF>,6:0]
What I need to change that 1-2
will be recognized as Number, Punctuation, Number and foo.bar
as Identifier, Punctuation, Identifier?
I could solve this problem with semantic predicates:
lexer grammar X86AsmLexer;
Number
: { _input.LA(-1) < '0' || _input.LA(-1) > '9' }? '-'? [0-9]+
;
Punctuation
: [\-.]
;
Identifier
: { _input.LA(-1) < 'a' || _input.LA(-1) > 'z' }? '.'? [a-zA-Z]+
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
LineComment
: ';' ~[\r\n]*
;
Now the test file
1-2
1 -2
1- 2
.foo
foo.bar
is lexed as expected:
[@0,0:0='1',<Number>,1:0]
[@1,1:1='-',<Punctuation>,1:1]
[@2,2:2='2',<Number>,1:2]
[@3,5:5='1',<Number>,2:0]
[@4,7:8='-2',<Number>,2:2]
[@5,11:11='1',<Number>,3:0]
[@6,12:12='-',<Punctuation>,3:1]
[@7,14:14='2',<Number>,3:3]
[@8,19:22='.foo',<Identifier>,5:0]
[@9,25:27='foo',<Identifier>,6:0]
[@10,28:28='.',<Punctuation>,6:3]
[@11,29:31='bar',<Identifier>,6:4]
[@12,34:33='<EOF>',<EOF>,7:0]