Search code examples
javacompiler-construction

What hidden charactere could be on my code in JAVA?


I am doing a lex analyzer and am facing some problems. After reading all the characters from the source code, I put them in a string and I'm reading character by character and doing the proper operations. In the end, this generates a list containing language tokens, spaces, breaklines and ... a damn character I can not identify and need to clean.

for (int i = 0; i < tokenList.size(); i++) {
    // Remove Espacos
    if (tokenList.get(i).getLexema().equals(" ")) {
        tokenList.remove(i);
    }
    // Remove Strings Vazias
    else if (tokenList.get(i).getLexema().length() == 0) {
        print("ada");
        tokenList.remove(i);
    }
    // Remove Tabulação
    else if (tokenList.get(i).getLexema().equals("\t")) {
        tokenList.remove(i);
    }
    // Remove Quebras de Linha
    else if (tokenList.get(i).getLexema().equals("\n")) {
        print("ASD");
        tokenList.remove(i);
    }
}

From the following entry:

int a;
char n;

After all the analysis, and cleaning up, I get the following result:

00 - Lex: int
01 - Lex: a
02 - Lex: ;
03 - Lex: 
04 - Lex: char
05 - Lex: n
06 - Lex: ;

There is an empty space and I do not know how to remove it.


Solution

  • SOLUTION:

    Well, those guys are incredible and I could solve my problem. The solution, using some better strategies of coding:

    for (int i = 0; i < tokenList.size(); i++) {
        String lexema = tokenList.get(i).getLexema();
    
        switch (lexema) {
            case "":
                tokenList.remove(i);
                i = i - 1;
                break;
            // Remove Espacos
            case " ":
                tokenList.remove(i);
                i = i - 1;
                break;
            // Remove Tabulações
            case "\t":
                tokenList.remove(i);
                i = i - 1;
                break;
            // Remove Quebras de Linha
            case "\n":
                tokenList.remove(i);
                i = i - 1; // DEIXAR SEM O BREAK
                break;
            // Remove Caractere Estranho
            case "\r":
                tokenList.remove(i);
                i = i - 1;
                break;
            default:
                break;
            }
    }