I'm trying to write some javacc grammar to parse a file that contains multi-line comments, for example, the following are all valid:
/**/
/* */
/* This is a comment */
/* This
is
a
multiline
comment
*/
I would like the parsing to fail if there is a /*
not closed by a */
, or a closing */
without an opening /*
.
I'm not trying to skip the comments, I want the comments available as tokens.
So far I have tried this method, which works but will not fail on un-closed /*
:
options {
STATIC = false;
}
PARSER_BEGIN(BlockComments)
package com.company;
public class BlockComments {}
PARSER_END(BlockComments)
TOKEN : { < START_BLOCK_COMMENT : "/*" > : WITHIN_BLOCK_COMMENT }
<WITHIN_BLOCK_COMMENT> TOKEN: { < BLOCK_COMMENT: (~["*", "/"] | "*" ~["/"])+ > }
<WITHIN_BLOCK_COMMENT> TOKEN: { < END_BLOCK_COMMENT: "*/" > : DEFAULT }
SKIP : {
"\n"
}
The other option I have tried is this, which has the same problem and the slight difference that /*
and */
are skipped instead being read as tokens:
options {
STATIC = false;
}
PARSER_BEGIN(BlockComments)
package com.company;
public class BlockComments {}
PARSER_END(BlockComments)
SKIP : { "/*" : WITHIN_BLOCK_COMMENT }
<WITHIN_BLOCK_COMMENT> TOKEN: { <BLOCK_COMMENT: (~["*", "/"] | "*" ~["/"])+ > }
<WITHIN_BLOCK_COMMENT> SKIP : { "*/" : DEFAULT }
SKIP : {
"\n"
}
I tried using MORE : { "/*" : WITHIN_BLOCK_COMMENT }
in the second option which makes sure parsing fails for un-closed /*
, but it makes all of the BLOCK_COMMENT
tokens start with /*
which I don't want.
I'm not sure what the rest of your file looks like, so I'll assume that a file is expected to be a sequence of comments preceded, followed, and separated by zero or more spaces and newlines.
What I would do is this:
TOKEN : { < BLOCK_COMMENT_START : "/*" > : WITHIN_BLOCK_COMMENT }
<WITHIN_BLOCK_COMMENT> TOKEN: { <CHAR_IN_COMMENT: ~[] > }
<WITHIN_BLOCK_COMMENT> TOKEN: { < END_BLOCK_COMMENT: "*/" > : DEFAULT }
SKIP : {
"\n" | " "
}
Now in the parser we have
void start() : {String s ; } {
(
s = comment() {System.out.println(s); }
)*
}
String comment() :
{ Token t ;
StringBuffer b = new StringBuffer() ;
}
{ <START_BLOCK_COMMENT>
(
t=<CHAR_IN_COMMENT> {b.append( t.image ) ; }
)*
<END_BLOCK_COMMENT>
{return b.toString() ; }
}
Now you don't get a lexical error for a missing */
, but you do get a parse exception.