I am doing a lex analyzer and am facing some problems. After reading all the characters from the source code, I put them in a string and I'm reading character by character and doing the proper operations. In the end, this generates a list containing language tokens, spaces, breaklines and ... a damn character I can not identify and need to clean.
for (int i = 0; i < tokenList.size(); i++) {
// Remove Espacos
if (tokenList.get(i).getLexema().equals(" ")) {
tokenList.remove(i);
}
// Remove Strings Vazias
else if (tokenList.get(i).getLexema().length() == 0) {
print("ada");
tokenList.remove(i);
}
// Remove Tabulação
else if (tokenList.get(i).getLexema().equals("\t")) {
tokenList.remove(i);
}
// Remove Quebras de Linha
else if (tokenList.get(i).getLexema().equals("\n")) {
print("ASD");
tokenList.remove(i);
}
}
From the following entry:
int a;
char n;
After all the analysis, and cleaning up, I get the following result:
00 - Lex: int
01 - Lex: a
02 - Lex: ;
03 - Lex:
04 - Lex: char
05 - Lex: n
06 - Lex: ;
There is an empty space and I do not know how to remove it.
SOLUTION:
Well, those guys are incredible and I could solve my problem. The solution, using some better strategies of coding:
for (int i = 0; i < tokenList.size(); i++) {
String lexema = tokenList.get(i).getLexema();
switch (lexema) {
case "":
tokenList.remove(i);
i = i - 1;
break;
// Remove Espacos
case " ":
tokenList.remove(i);
i = i - 1;
break;
// Remove Tabulações
case "\t":
tokenList.remove(i);
i = i - 1;
break;
// Remove Quebras de Linha
case "\n":
tokenList.remove(i);
i = i - 1; // DEIXAR SEM O BREAK
break;
// Remove Caractere Estranho
case "\r":
tokenList.remove(i);
i = i - 1;
break;
default:
break;
}
}