I have defined a LEX scanner with the following rule for scanning (nested) comments:
"(*" {
int linenoStart, level, ch;
linenoStart = yylineno;
level = 1;
do {
ch = input();
switch (ch) {
case '(':
ch = input();
if (ch == '*') {
level++;
} else {
unput(ch);
}
break;
case '*':
ch = input();
if (ch == ')') {
level--;
} else {
unput(ch);
}
break;
case '\n':
yylineno++;
break;
}
} while ((level > 0) && (ch > 0));
assert((ch >= 0) || (ch == EOF));
if (level > 0) {
fprintf(stderr, "error: unterminated comment starting at line %d", linenoStart);
exit(EXIT_FAILURE);
}
}
When compiled with FLEX 2.6.4 I get the following error message when I run the scanner on an input file containing a comment with more than 16382 characters:
input buffer overflow, can't enlarge buffer because scanner uses REJECT
Why is that and how can the problem be resolved?
When a pattern is matched, in this case (*
, only YY_BUF_SIZE
= 16384 characters can be retrieved with the function input
which reads from a buffer of this size. This limits the size of a comment to YY_BUF_SIZE
characters. To enable comments of any length we can instead use patterns with a context, like this:
"(*" {
BEGIN(comment);
commentNestingLevel = 1;
commentStartLine = yylineno;
}
<comment>[^*(\n]+
<comment>\n {
yylineno++;
}
<comment>"*"+/[^)]
<comment>"("+/[^*]
<comment>"(*" commentNestingLevel++;
<comment>"*)" {
commentNestingLevel--;
if (commentNestingLevel == 0) {
BEGIN(INITIAL);
}
}
<comment><<EOF>> {
fprintf(stderr, "error: unterminated comment starting at line %d", commentStartLine);
exit(EXIT_FAILURE);
}