Search code examples
cregexcompiler-constructionlexlexical-analysis

Lex :Match multiple regex at the same time


I have the code below: I want to count the number of characters using

(.)   {charCount++;}

and at the same time count the number of words using

([a-zA-Z0-9])*    {wordcount++;}

is this possible using lex rules, or do I have to count it "manaully/programmatically" using the file stream in c code . Basically is there code for "continue matching/ regex"

%% 
[\t ]+  ; //ignore white space
"\n" ;  //ignore next line // 
([a-zA-Z0-9])*    {wordcount++;}
(.)   {charCount++;}
%% 
int yywrap(void){} 
int main() 
{    
    // The function that starts the analysis 
    yyin=fopen("input.txt", "r");
    yylex(); 
    printf("The number of words in the file is %d and charCount %d",count,wordSize);
    return 0; 
}

Solution

  • The number of characters matched by a rule is available in yyleng, so you can do this:

    [ \t\n]         ;
    [a-zA-Z0-9]+    { ++wordcount; charcount += yyleng; }
    .               { ++charcount; }
    

    But (f)lex is not designed to do multiple scans over the input. So there's no simple general solution.

    FWIW, I'd use [[:alnum:]] instead of [a-zA-Z0-9] and [[:space:]] instead of [ \t\n].