Search code examples
antlrlexical-analysislines-of-code

Can ANTLR return Lines of Code when lexing?


I am trying use ANTLR to analyse a large set of code using full Java grammar. Since ANTLR needs to open all the source files and scan them, I am wondering if it can also return lines of code.

I checked API for Lexer and Parser, it seems they do not return LoC. Is it easy to instrument the grammar rule a bit to get LoC? The full Java rule is complicated, I don't really want to mess a large part of it.


Solution

  • If you have an existing ANTLR grammar, and want to count certain things during parsing, you could do something like this:

    grammar ExistingGrammar;
    
    // ...
    
    @parser::members {
      public int loc = 0;
    }
    
    // ...
    
    someParserRule
     : SomeLexerRule someOtherParserRule {loc++;}
     ;
    
    // ...
    

    So, whenever your oparser encounters a someParserRule, you increase the loc by one by placing {loc++;} after (or before) the rule.

    So, whatever your definition of a line of code is, simply place {loc++;} in the rule to increase the counter. Be careful not to increase it twice:

    statement
     : someParserRule {loc++;}
     | // ...
     ;
    
    someParserRule
     : SomeLexerRule someOtherParserRule {loc++;}
     ;
    

    EDIT

    I just noticed that in the title of your question you asked if this can be done during lexing. That won't be possible. Let's say a LoC would always end with a ';'. During lexing, you wouldn't be able to make a distinction between a ';' after, say, an assignment (which is a single LoC), and the 2 ';'s inside a for(int i = 0; i < n; i++) { ... } statement (which wouldn't be 2 LoC).