Search code examples
compiler-constructionflex-lexerlex

Use flex to uppercase C comments


I want to uppercase C comments in a text using flex.

Here is my flex code:

%{
#include <ctype.h>
%}

%%

\/\/.* {
    for(int i = 0; i < strlen(yytext); i++)
        printf("%c", toupper(yytext[i]));   
}

\/\*[^\*]*\*(\*|[^\*\/][^\*]*)*\/ {
    for(int i = 0; i < strlen(yytext); i++)
        printf("%c", toupper(yytext[i]));
}

%%

int main(int argc ,char* argv[]) {
    yylex();
    return 0;
}

int yywrap() {
    return 1;
}

Here is the testing text:

/*aBc*aBc/aBc*/
/** /aBc*/
/*aBc*/aBc*/
aBc
aBc/*aBC
aBc/aBc*aBc
aBc**/
/*aBc/*aBc
//aBc
//aBc
aBc

The result:

/*ABC*ABC/ABC*/
/** /ABC*/
/*ABC*/aBc*/
aBc
aBc/*ABC
ABC/ABC*ABC
ABC**/
/*ABC/*ABC
//ABC
//aBc
aBc

The second and fourth line from the end in the result are weird.

What's wrong with my program?


Solution

  • The problem is simply that the regular expression you are using is incorrect.

    Here's a correct flex regular expression for C-style comments:

    "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/"
    

    That uses double-quotes (a flex feature) to quote the regex metacharacters. Note that it is not necessary to escape regex operators inside character classes.

    The alternatives aren't very pretty either:

    1. Forest of leaning timber:

      \/\*[^*]*\*+([^*/][^*]*\*+)*\/
      
    2. Clutter of character classes:

      [/][*][^*]*[*]+([^*/][^*]*[*]+)*[/]