Search code examples
antlr4infinite-loop

is this a Antlr4 bug or a misunderstanding? it does not ends


I found a case where Antlr4 does not ends and gets into an infinity loop. The next case is the maximum reduction I made from a larger grammar. I do not need a solution; just only to know if is is a Antlr4 bug or if it is a misunderstanding. In case of misunderstanding, I would appreciate an explanation of what is happening

Note: I tried with several versions of antlr4, including the lastest one (4.10.1) and I had the same results.

This is the simplest grammar I found:

grammar NotEndingIfMismatchedInput;

document    : TEXT  EOF ;
TEXT : [a-zA-Z]*  ;

If the imput is with a mismatched input (for example 'aaa44') it gets inside of an infinity loop and does not ends.

This is the simplest class I made that never ends:

import generated.NotEndingIfMismatchedInputLexer;
import generated.NotEndingIfMismatchedInputParser;
import generated.NotEndingIfMismatchedInputParser.DocumentContext;

import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CodePointCharStream;
import org.antlr.v4.runtime.CommonTokenStream;

public class Test
  {
  public static void main(String[] args)
    {
    CodePointCharStream input = CharStreams.fromString("aaa44");
    NotEndingIfMismatchedInputLexer lexer = new NotEndingIfMismatchedInputLexer(input);   
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    NotEndingIfMismatchedInputParser parser = new NotEndingIfMismatchedInputParser(tokens);
    System.out.println("Beginning to parse");
    DocumentContext foo = parser.document();
    System.out.print("Not recheable code. It is in a infinity loop!!!");
    }
  }

The output is:

Beginning to parse
line 1:3 mismatched input '' expecting <EOF>   <-- In the System.err

and then, it does not stop.


Solution

  • Never define a token that can match an empty string (there are an infinite amount of empty string in your input: hence the infinite loop you get).

    Instead of doing:

    TEXT : [a-zA-Z]*;
    

    do this instead:

    TEXT : [a-zA-Z]+;