I have this basic JFlex lexer :
import java.util.*;
%%
%public
%class TuringLexer
%type Void
%init{
yybegin(YYINITIAL);
%init}
%state COMM, GETALPH, MT, PARSELOOP, PARSELEMS, PARSESYMB, PARSEMT
%{
ArrayList<Character> alf = new ArrayList<Character>();
String crtMach;
String crtLoop;
String crtLoopContent;
String crtLoopContentParam;
String crtContent;
String crtSymb;
%}
//Input = [^\r\n]
SEP = [:space:]*
//COMM =[;.*$]
name = [A-Za-z_]*
tok=[A-Za-z0-9#$@\*]
AL = "alphabet :: "
cont = [^]]*
param =[^)]*
letter = [A-Za-z]
opn = [\[?]
symb = [^\}]+
%%
<COMM> {
"." { /* ignore */ System.out.println("Got into comm state ");}
"\n" {System.out.println("Got out of comm state ");yybegin(YYINITIAL);}
}
<GETALPH> {
{SEP} { /* ignore */ }
{tok} { String str = yytext();
System.out.println("Alphabet -- " + str);
Character c = str.charAt(0);
alf.add(c); }
";" {yybegin(YYINITIAL);}
}
<YYINITIAL> {
"\n" { /* ignore */ System.out.println("Got into YYINITIAL"); }
";" { yybegin(COMM); }
[^] { throw new Error("Illegal character <"+yytext()+">"); }
}
Code has been removed for clarity, but the issue still persists so it is easier to identify it here.
this is the input file -> file is called simple.mt
And this is the main class :
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.Reader;
import java.io.BufferedReader;
import java.io.FileReader;
public class MainClass {
public static void main(String args[]) throws IOException {
Reader reader = new BufferedReader(new FileReader ("simple.mt"));
reader.read();
TuringLexer tl = new TuringLexer(reader);
tl.yylex();
}
}
When I run the project in eclipse ( or terminal, for that matter) I get:
Exception in thread "main" java.lang.Error: Illegal character <l>
at TuringLexer.yylex(TuringLexer.java:576)
at MainClass.main(MainClass.java:11)
I have no idea what the error means and how can I debug it, what remained from the jflex file is a small sample so the error shouldn't be that hard to figure out
So you have a character appearing in your input that you don't know how to handle.
All lex files should have a final . rule that either prints an 'illegal character' error message (not a thrown exception), or else just returns yytext[0]
to the parser for the parser to deal with.
The latter strategy also saves you from having to write a rule for each special character, for example =, + and so on: the parser should just use them as '='
, '+'
, etc. Then (a) any illegal character just becomes a syntax error, but more importantly (b) the parser gets to use its error recovery, rather than just throwing the token away.