I'm writing lexical specification for JFlex (it's like flex, but for Java). I have problem with TraditionalComment (/* */
) and DocumentationComment (/** */
). So far I have this, taken from JFlex User's Manual:
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
WhiteSpace = {LineTerminator} | [ \t\f]
/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}
TraditionalComment = "/*" [^*] ~"*/" | "/*" "*"+ "/"
EndOfLineComment = "//" {InputCharacter}* {LineTerminator}
DocumentationComment = "/**" {CommentContent} "*"+ "/"
CommentContent = ( [^*] | \*+ [^/*] )*
{Comment} { /* Ignore comments */ }
{LineTerminator} { return LexerToken.PASS; }
LexerToken.PASS
means that later I'm passing line terminators on output. Now, what I want to do is:
Ignore everything which is inside the comment, except new line terminators.
For example, consider such input:
/* Some
* quite long comment. */
In fact it is /* Some\n * quite long comment. */\n
. With current lexer it will be converted to a single line. The output will be single '\n'. But I would like to have 2 lines, '\n\n'. In general, I would like that my output will always have the same number of lines as input. How to do it?
After couple of days I found a solution. I will post it here, maybe somebody will have the same problem.
The trick is, after recognizing that you are inside a comment - go once more through its body and if you spot new line terminators - pass them, not ignore:
%{
public StringBuilder newLines;
%}
// ...
{Comment} {
char[] ch;
ch = yytext().toCharArray();
newLines = new StringBuilder();
for (char c : ch)
{
if (c == '\n')
{
newLines.append(c);
}
}
return LexerToken.NEW_LINES;
}