Search code examples
javagrammarantlr4lexerambiguous

Lookback lexer member in antlr4


is there a way to "fake" a lookback in Antlr4 ? I want to resolve some ambiguity according to the token right before where I am.

EDIT

read: STAR text STAR text STAR text
| STAR text STAR KEY_WORD STAR text

text: STR +;

@lexer::members {
  private boolean checkAhead(int maxAmountOfCharacters, String pattern) {
    final Interval ahead = new Interval(this._tokenStartCharIndex, this._tokenStartCharIndex + maxAmountOfCharacters - 1);
    return this._input.getText(ahead).matches(pattern);
  }

KEY_WORD: LETTER LETTER LETTER LETTER LETTER ;
STAR :'*';

STR: {( !checkAhead(6, "([A-Z]){5}[*](\\D|$)") }?
    (
        LETTER
        | DIGIT
    )+
;

I want the input to be read as KEY_WORD only if it's STAR KEY_WORD STAR for now if the last word of a text is [A-Z]{5} it's matched to KEY_WORD


Solution

  • You can use a negative offset in the lookahead functions LA() and LT() (LA gives you just the token type, while LT gives you the entire token). Note: LA(0) is not defined, but you can use LA(-1), LA(-2), LT(5) etc.

    Another note: looking back further than one step works only with buffered token streams. Unbuffered streams only cache a single token (the previous one).

    The TokenStream class (from which your this._input instance is derived) defines the LT() function. LA() is available in the IntStream class (which is the ancestor for any character input stream, like CharStream or ANTLRInputStream.