I am new to Lex (Flex) and I am solving a question that asks me to write a lex program that copies a file, replacing each non-empty sequence of whitespace by a single blank. Here is what I have tried
%{
FILE *rp,*wp;
/*Read pointer and write pointer*/
%}
delim [ \t\n]
ws {delim}+
nows [^{ws}]
%%
{nows} {fprintf(wp,"%s",yytext);}
{ws} {fprintf(wp,"%c",' ');}
%%
int yywrap(){}
int main(int argc,char** argv){
rp=fopen(argv[1],"r");
wp=fopen(argv[2],"w+");
yyin=rp;
yylex();
fclose(yyin);
fclose(wp);
return 0;
}
I thought that using caret(^) character I would match any character other than the whitespaces but instead, it is removing w and s from the input.
So does anyone know how can I negate the whitespaces? Also, any other approach to solve the problem is welcome.
Thank you in advance.
With the help from the book on compilers by Alfred V Aho and Jeffrey D Ullman here is a solution to the above problem.
The ws can be defined as ws [\t \n]+
and nows can be defined as nows .
.
Even though .
is used to match all characters but since ws
will be written first, therefore, lex will match this rule when it sees a whitespace character.
Therefore the complete code becomes
%{
#include<stdio.h>
FILE *rp,*wp;
/*Read pointer and write pointer*/
%}
ws [\t \n]+
nows .
%%
{nows} {fprintf(wp,"%s",yytext);}
{ws} {fprintf(wp," ");}
%%
int yywrap(){}
int main(int argc,char** argv){
rp=fopen(argv[1],"r");
wp=fopen(argv[2],"w");
yyin=rp;
yylex();
fclose(yyin);
fclose(wp);
return 0;
}
Here is an input and output file demonstrating the working of the program
input.txt
This is a test file for
the
program copy.l This file must be properly
formatted.
Here we are trying to
write some gibberish
Also here is some line.
And here is its output
output.txt
This is a test file for the program copy.l This file must be properly formatted. Here we are trying to write some gibberish Also here is some line.