Search code examples
javacompiler-constructiontokenlexerjflex

JFlex Lexer that distinguishes "Class brackets" and "Method bracket"


I need to write a lexer for a java source code plagiarism detector. Here is an example what I want to achieve.

//Java code                                   Tokens:
public class Count {                          Begin Class
    public static void main(String[] args)    Var Def, Begin Method
        throws java.io.IOException {
      int count = 0;                          Var Def, Assign
      while (System.in.read() != -1)          Apply, Begin While
        count++;                              Assign, End While
      System.out.println(count+" chars.");    Apply

    }                                         End Method
}                                             End Class

I think Jflex is the right tool to generate the lexer. However after looking through some examples. I cannot find a way to distinguish class brackets and method brackets. Most tokenizers I find just recognize them as same token. Also how do I distinguish a method apply from a variable identifier?


Solution

  • I cannot find a way to distinguish class brackets and method brackets.

    There is nothing lexically different about them. "{".equals("{"). The way you distinguish them is by context in the parser. The lexer can't make that distinction, nor should it.

    Also how do I distinguish a method apply from a variable identifier

    In the lexer, you don't. An identifier is an identifier. The token stream generated from "f(x)" should be Identifier, OpeningParenthesis, Identifier, ClosingParenthesis.

    Now in the parser you'll recognize a function name by the fact that it's followed by an opening parentheses, but again that's the parser's, not the lexer's, job.