I know how to skip these comments using SKIP declarations, but all I need to do is to take a C source and output the same source without comments.
So I declared a token <GENERIC_TEXT: (~[])+ >, that gets copied to output, and comments aren't skipped. I suspect this token takes all the input for itself.
Can someone help me, please?
Thank you
Don't use (~[])+
: it will gobble up all your input. That is probably why you didn't see tokens being skipped.
In your default lexer mode, change to a different state when you encounter "/*"
(the beginning of a multi-line comment). And in this different stat, either match "*/"
(and swicth back to the default lexer-state), or match any char ~[]
(not (~[])+
!).
A quick demo:
PARSER_BEGIN(CommentStripParser)
public class CommentStripParser {
public static void main(String[] args) throws Exception {
java.io.FileInputStream file = new java.io.FileInputStream(new java.io.File(args[0]));
CommentStripParser parser = new CommentStripParser(file);
parser.parse();
}
}
PARSER_END(CommentStripParser)
TOKEN :
{
< OTHER : ~[] >
}
SKIP :
{
< "//" (~["\r", "\n"])* >
| < "/*" > : ML_COMMENT_STATE
}
<ML_COMMENT_STATE> SKIP :
{
< "*/" > : DEFAULT
| < ~[] >
}
void parse() :
{
Token t;
}
{
( t=<OTHER> {System.out.print(t.image);} )* <EOF>
}
Given the test file:
/*
* comments
*/
class Test {
// more comments
int foo() {
return 42;
}
}
Run the demo like this (assuming you have the files CommentStripParser.jj, Test.java and the JAR javacc.jar in the same directory):
java -cp javacc.jar javacc CommentStripParser.jj javac -cp . *.java java -cp . CommentStripParser Test.java
the following would be printed to your console:
class Test {
int foo() {
return 42;
}
}
(no comments anymore)
Note that you will still need to account for string literals that might look like this:
"the following: /*, is not the start of a comment"
and char literals:
'"' // not the start of a string literal!