Search code examples
c#antlrantlr4antlr4cs

Antlr4: How to pass current token's value to lexer's predicate?


Is there a way to provide a lexer's predicate with the current token's value? for instance, in my lexer grammar FlowLexer, I dynamically load tokens:

Before I parse, I load the tokens dynamically:

var lexer = new FlowLexer(new AntlrInputStream(flowContent)) {
    TokenExists = tokenValue => tokensDictionary.ContainsKey(tokenValue)
};

And then during parsing/lexing, the TokenExists predicate is called:

@lexer::members{
    public Func<string,bool> TokenExists = null;
}

/* ... stuff ... */

TOK : [-_.0-9a-zA-Z]+ 
    {!TokenExists(/*WHAT GOES HERE?*/);}? 
    -> mode(IN_TOKEN);

/* ... stuff ... */

But how do I pass the token value to the TokenExists predicate?

(This is an attempt to create context-aware lexer: I have several modes, and in which one there are different rules).


Solution

  • Accessing token values in ANTLR4 predicates and actions is possible with a special syntax. For details see the Actions and Attributes doc.

    In general, you access a parsed token by using dollar sign and the token name, like

    a: x = INT {$x.text == "0"}?;
    

    or without a label (and only if the subrule exists only once in that parser rule):

    a: INT {$INT.text == "0"}?;
    

    ANTLR4 translates such pseudo code into target language code to allow accessing token properties (e.g. in C++ this becomes: INT->getText() == "0").

    In lexer rules, however, this special access ist not possible (ANTLR3 supported it, but not ANTLR4). Still, you can access a token's properties with native code (in fact it's not a token directly, since it doesn't exist yet, but values which will be used to create it from, once the lexer rule has finished). Though, this is often not portable to other target languages (which doesn't matter if you don't have more than a single parser target).

    The code triggered in a lexer action (which includes predicates) is executed in the context of the lexer. This lexer keeps values from which the new token will be created, after the rule has ended. This allows to get the currently matched text:

    TOK : [-_.0-9a-zA-Z]+ {!TokenExists(Text);}? -> mode(IN_TOKEN);
    

    Text is a property of the C# lexer.