Search code examples
antlr4

What is the difference between Atom Transition, Set Transition and Epsilon Transition in the ANTLR4 ATN?


What is the difference between Atom Transition, Set Transition and Epsilon Transition in ANTLR4 ATN? Couldn't find any definitions online.


Solution

  • You won't find any definition, because that's an internal detail which is of no interest for most people.

    The different transition types are mostly used to indicate the condition to match when the ATN walking algorithm processes them. There are 10 of them:

    1. Epsilon: a transition that has no condition and consumes no input. The walker just skips them.
    2. Range: probably an older version of the Set transition and not used in ANTLR4. The only difference is that Range takes start and end values of the range to match, while Set takes an interval set.
    3. Rule: an epsilon transition in the parser ATN to a subrule. Consumes nothing.
    4. Predicate: an epsilon transition with an attached predicate as condition. Consumes nothing.
    5. Atom: matches a single input symbol and consumes only one.
    6. Action: an epsilon transition that consumes nothing but has an action attached (executed in target code when this transition is being processed).
    7. Set: matches a set of intervals (to allow gaps) and consumes only one input symbol. Typical example is: [a-zA-Z]. Used only in the lexer.
    8. Not set: a set transition with inverted intervals (Full Unicode minus the given set).
    9. Wildcard: matches any single input and consumes one input symbol.
    10. Precedence (aka Precedence Predicate): Not 100% sure about this one. Seems to be used for precedence in transformed left recursive rules and also has a predicate attached. In that sense it's a pretty special transition, compared to the others.

    Here's an example for some of the transitions:

    enter image description here

    This is the ATN for the rule: LETTER: [a-zA-Z] '$';. It starts with the ATN state 1 and has a single epsilon transition to the first basic state. That one has a single outgoing Set transtion to another basic state. From there an Atom transition goes out to yet another intermediate state and from there to the rule end.

    For this and more visualizations, grammar debugging and ANTLR4 language support install my vscode extension for ANTLR4 (provided you use Visual Studio Code as grammar editor).