Search code examples
regexlex

How to write a regex in lex that negates the whitespaces


I am new to Lex (Flex) and I am solving a question that asks me to write a lex program that copies a file, replacing each non-empty sequence of whitespace by a single blank. Here is what I have tried

%{
    FILE *rp,*wp;
    /*Read pointer and write pointer*/
%}

delim   [ \t\n]
ws      {delim}+
nows    [^{ws}]
%%
{nows}  {fprintf(wp,"%s",yytext);}
{ws}    {fprintf(wp,"%c",' ');}
%%
int yywrap(){}

int  main(int argc,char** argv){
    rp=fopen(argv[1],"r");
    wp=fopen(argv[2],"w+");
    yyin=rp;
    yylex();
    fclose(yyin);
    fclose(wp);
    return 0;
}

I thought that using caret(^) character I would match any character other than the whitespaces but instead, it is removing w and s from the input.
So does anyone know how can I negate the whitespaces? Also, any other approach to solve the problem is welcome.
Thank you in advance.


Solution

  • With the help from the book on compilers by Alfred V Aho and Jeffrey D Ullman here is a solution to the above problem. The ws can be defined as ws [\t \n]+ and nows can be defined as nows .. Even though . is used to match all characters but since ws will be written first, therefore, lex will match this rule when it sees a whitespace character. Therefore the complete code becomes

    %{
        #include<stdio.h>
        FILE *rp,*wp;
        /*Read pointer and write pointer*/
    %}
    
    ws      [\t \n]+
    nows    .
    %%
    {nows}  {fprintf(wp,"%s",yytext);}
    {ws}    {fprintf(wp," ");}
    %%
    int yywrap(){}
    
    int  main(int argc,char** argv){
        rp=fopen(argv[1],"r");
        wp=fopen(argv[2],"w");
        yyin=rp;
        yylex();
        fclose(yyin);
        fclose(wp);
        return 0;
    }
    

    Here is an input and output file demonstrating the working of the program
    input.txt

    This is     a test  file   for 
    the 
        program copy.l This file must be properly
    formatted.
    Here   we are trying to
        
            write     some gibberish   
        Also here is   some line.
    

    And here is its output
    output.txt

    This is a test file for the program copy.l This file must be properly formatted. Here we are trying to write some gibberish Also here is some line.