Search code examples
antlrantlr4state-machineregular-languageautomata

How to understand ATN graph generated for ANTLR grammar?


I have 2 simple lexer rules in my ANTLR4 grammar:

fragment Attrs : '.' ARCH; 
fragment ARCH : 'IA32' | 'X64' | 'IPF' | 'EBC' | 'common';

The generated ATN with ANTLR4.7 is like this (Visual Studio Code):

enter image description here

I searched some references about "ATN", such as this one.

It's beautiful but I don't understand it:

  • What does the number and label in the node mean?
  • What does the epsilon symbol on the arrow line mean?
  • What do the grey and red nodes mean?

Solution

  • The ATN graph in the image represents a single rule, consisting of ATN states produced by the parser generator. These are mostly interesting for developers who want to write code that uses the ATN in some way (e.g. for code completion). Usually you don't need this information for your grammar work. It might also come in handy to see how the ATN graph changes when you change something in the grammar (for fine tuning of your grammar).

    What you see in the image are circles for ATN states, with their unique ID (no 2 states share the same state number) along with a label indicating the state's type (rule start/end state, basic state etc.). You can get a bit more info by hovering with the mouse over a state until you get the tooltip. The rounded rectangle describes a rule called by this rule.

    Most states are connected via transitions which describe the direction a parser has to walk when executing this state machine. A transition can be executed without consuming input (which is then called an epsilon transition, marked by that little epsilon symbol) or requires certain input to match (which is denoted as label in the ATN and also attached to the transition arrow in the graph image).