I found a case where Antlr4 does not ends and gets into an infinity loop. The next case is the maximum reduction I made from a larger grammar. I do not need a solution; just only to know if is is a Antlr4 bug or if it is a misunderstanding. In case of misunderstanding, I would appreciate an explanation of what is happening
Note: I tried with several versions of antlr4, including the lastest one (4.10.1) and I had the same results.
This is the simplest grammar I found:
grammar NotEndingIfMismatchedInput;
document : TEXT EOF ;
TEXT : [a-zA-Z]* ;
If the imput is with a mismatched input (for example 'aaa44') it gets inside of an infinity loop and does not ends.
This is the simplest class I made that never ends:
import generated.NotEndingIfMismatchedInputLexer;
import generated.NotEndingIfMismatchedInputParser;
import generated.NotEndingIfMismatchedInputParser.DocumentContext;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CodePointCharStream;
import org.antlr.v4.runtime.CommonTokenStream;
public class Test
{
public static void main(String[] args)
{
CodePointCharStream input = CharStreams.fromString("aaa44");
NotEndingIfMismatchedInputLexer lexer = new NotEndingIfMismatchedInputLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
NotEndingIfMismatchedInputParser parser = new NotEndingIfMismatchedInputParser(tokens);
System.out.println("Beginning to parse");
DocumentContext foo = parser.document();
System.out.print("Not recheable code. It is in a infinity loop!!!");
}
}
The output is:
Beginning to parse
line 1:3 mismatched input '' expecting <EOF> <-- In the System.err
and then, it does not stop.
Never define a token that can match an empty string (there are an infinite amount of empty string in your input: hence the infinite loop you get).
Instead of doing:
TEXT : [a-zA-Z]*;
do this instead:
TEXT : [a-zA-Z]+;