Search code examples
compiler-constructionflex-lexerlex

Input buffer overflow in scanner for long comments


I have defined a LEX scanner with the following rule for scanning (nested) comments:

"(*" {
    int linenoStart, level, ch;

    linenoStart = yylineno;
    level = 1;
    do {
        ch = input();
        switch (ch) {
            case '(':
                ch = input();
                if (ch == '*') {
                    level++;
                } else {
                    unput(ch);
                }
                break;
            case '*':
                ch = input();
                if (ch == ')') {
                    level--;
                } else {
                    unput(ch);
                }
                break;
            case '\n':
                yylineno++;
                break;
        }
    } while ((level > 0) && (ch > 0));
    assert((ch >= 0) || (ch == EOF));
    
    if (level > 0) {
        fprintf(stderr, "error: unterminated comment starting at line %d", linenoStart);
        exit(EXIT_FAILURE);
    }
}

When compiled with FLEX 2.6.4 I get the following error message when I run the scanner on an input file containing a comment with more than 16382 characters:

input buffer overflow, can't enlarge buffer because scanner uses REJECT

Why is that and how can the problem be resolved?


Solution

  • When a pattern is matched, in this case (*, only YY_BUF_SIZE = 16384 characters can be retrieved with the function input which reads from a buffer of this size. This limits the size of a comment to YY_BUF_SIZE characters. To enable comments of any length we can instead use patterns with a context, like this:

    "(*" {
        BEGIN(comment);
        commentNestingLevel = 1;
        commentStartLine = yylineno;
    }
    
    <comment>[^*(\n]+
    
    <comment>\n {
        yylineno++;
    }
    
    <comment>"*"+/[^)]
    
    <comment>"("+/[^*]
    
    <comment>"(*" commentNestingLevel++;
    
    <comment>"*)" {
        commentNestingLevel--;
        if (commentNestingLevel == 0) {
            BEGIN(INITIAL);
        }
    }
    
    <comment><<EOF>> {
        fprintf(stderr, "error: unterminated comment starting at line %d", commentStartLine);
        exit(EXIT_FAILURE);
    }