Search code examples
compiler-constructionbisonflex-lexerlex

Understanding lex code syntax


In the following lex code, I don't understand the use of the angle brackets, . How does the <newstate>{DEFINITION} syntax work?

     %{
    #include<stdio.h>
    int c=0;
    %}
    START "/*"
    END "*/"
    SIMPLE [^*]
    SPACE [ \t\n]
    COMPLEX "*"[^/]
    %s newstate
    %%
    "//"(.*[ \t]*.*)*[\n]+    {c++; fprintf(yyout," ");}
    {START}                    {yymore();BEGIN newstate;}
     <newstate>{SIMPLE}        {yymore();BEGIN newstate;}
     <newstate>{COMPLEX}      {yymore();BEGIN newstate;}
     <newstate>{SPACE}        {yymore();BEGIN newstate;}
     <newstate>{END}  {c++;fprintf(yyout," ");BEGIN 0;}
    %%
    main()
    {//program to remove comment lines
    yyin=fopen("file4","r");
    yyout=fopen("fileout4","w");system("cat file4");
    yylex();system("cat fileout4");
    printf("no.of comments=%d",c);
    fclose(yyin);
    fclose(yyout);
    }
    `

Solution

  • With this "%s newstate" you are declaring a start condition name, in your case the name is "newstate"; You can use %s ,%S or %Start to declare a start condition.

    The conditions may be referenced at the head of a rule with the <> brackets.

    e.g: referencing newstate as your start condition for first rule :

                    <newstate> {SIMPLE}       { yymore(); BEGIN newstate; }
    

    Your above rule will be only recognized when Lex is in the start condition named "newstate". You are entering this start condition by executing the action statement

                              BEGIN newstate;
    

    Let me give you a sample example to understand its use : in this example, I will use three start conditions each one represents something, AN= animals, PT=Planets and BR= Birds.

    This flex example will help you tell to which category the name you typed followed by "is?" belongs to. We have three categories: Animals , Planets and birds. (to make it simple I only handle monkey, horse, Jupiter and swan).

                         %{
                         #include<stdio.h>
                         %}
    
                         %START AN PT BR
    
                         %%
                         ^monkey             {ECHO; BEGIN AN;}
                         ^horse              {ECHO; BEGIN AN;}
                         ^Jupiter            {ECHO; BEGIN PT ;}
                         ^swan               {ECHO; BEGIN BR;}
                         \n                  {ECHO; BEGIN 0;}
                         <AN>is?             printf(" is an Animal.!");
                         <PT>is?            printf(" is a Planet in our solar system.!");
                         <BR>is?            printf(" is a Bird.!");
                         . ;
                         %%
    
                         main()
                         {
                         yylex();
                         }
    

    For the following inputs we will be replacing "is ?" based on the prefix :

                     input  ->          monkey is ?
                     output ->          monkey is an Animal.!
    

    Here we are replacing " is ?" with " is an Animal.!" by redirecting the Lexical Analyzer to the "AN" start condition hence the associated rule " is? printf(" is an Animal.!"); " will be executed.

                     input  ->          swan is ?
                     output ->          swan is Bird.!
    

    Here we are replacing " is ?" with " is a Bird.!" by redirecting the Lexical Analyzer to the "BR" start condition hence the associated rule " is? printf(" is a Bird.!"); " will be executed.

                     input  ->          horse is ?
                     output ->          horse is an Animal.!
    

    Here we are replacing " is ?" with " is an Animal.!" by redirecting the Lexical Analyzer to the "AN" start condition hence the associated rule " is? printf(" is an Animal.!"); " will be executed.

                     input  ->          Jupiter is ?
                     output ->          Jupiter is a Planet in our solar system.!
    

    Here we are replacing " is ?" with " is a Planet in our solar system.!" by redirecting the Lexical Analyzer to the "PT" start condition hence associated rule " is? printf(" is a Planet in our solar system.!"); " will be executed.

    So in this example you see that we are replacing " is ?" based on the prefix. If the prefix is Jupiter we echo "Jupiter" and redirect the Lexical Analyzer to the "PT" start condition hence the associated rule will be executed.

    I hope this helped you to understand, let me know if you have any issues with the explanation!