Search code examples
clex

Lex: recognizing data type in a file


I'm a beginner in lex. My program is to read a file and recognize the data types. I'm getting this error by writing "gcc -c lex.yy.c" in windows command line:-

code1.l: In function 'yylex':

code1.l:8:2: error: expected ';' before '{' token

My input file is:

#include <stdio.h> 
int main()
{
    int a;
    float b;
    long int c;
    long long int d;

    return 0;
}

My lex code is

    %{
    #include <stdio.h>
    #include <stdlib.h>
%}

%%
    "int"
    {
        printf("%s ->Intger data type\n", yytext);
    }
    "float"
    {
        printf("%s ->Floating point data type\n", yytext);
    }
    "double"
    {
        printf("%s ->double data type\n", yytext);
    }
    "char"
    {
        printf("%s ->character data type\n", yytext);
    }
    "void"
    {
        printf("%s ->void data type\n", yytext);
    }
    "bool"
    {
        printf("%s ->boolean data type\n", yytext);
    }
    "long"([ ])([ ])* "int"
    {
        printf("%s ->long int data type\n", yytext);
    }
    "long"([ ])([ ])* "long"([ ])([ ])* "int"
    {
        printf("%s ->long long int data type\n", yytext);
    }
    ([a-z])* 
    {
        printf("%s ->Not a data type\n", yytext);
    }
    ([0-9])* 
    {
        printf("%s ->Not a data type\n", yytext);
    }

%%

int yywrap()
{
    return 1;
}

int main()
{
    FILE *fp = fopen("sampleIO.txt","r");
    yyin=fp;
    yylex();
    return 0;
}

I can't find the error.Any help will be appreciated. Thank you.


Solution

  • In lex/flex, the patterns must start right at the beginning of a line, as explained in the flex manual:

    The rules section of the flex input contains a series of rules of the form:
    pattern action
    where the pattern must be unindented and the action must begin on the same line.

    Moreover,

    Any indented text… is copied verbatim to the output.

    So your indented patterns, starting at line 8 as indicated in the gcc error message, were just passed straight through to the lex.yy.c file, where they failed to compile because they are not valid C.

    Also, patterns cannot have unquoted whitespace. So even with correct indentation, the pattern:

    "long"([ ])([ ])* "int"
    

    would need to be written

    "long"([ ])([ ])*"int"
    

    With the extra space, it looks to flex like a pattern ("long"([ ])([ ])*) and an action ("int"). There is no need to place parentheses around square brackets; that only serves to obscure the pattern. And flex includes both the * regular expression operator (0 or more repeats) and the + operator (1 or more repeats), so that pattern could be written as:

    "long"[ ]+"int"
    

    which is much clearer, but not actually correct; the words long and int could be separated by tabs or newline characters (or, as it happens, comments, but we won't get into that just yet). So a better but still not perfect pattern might be:

    "long"[[:space:]]+"int"