I am trying to match comments ({comment ends and starts with curtly brackets}) using this code
com ^"\{"(.|\n)*"\}"$
%option noyywrap
%%
[^{com}] ;
{com} printf("%s",yytext);
%%
void main()
{
yylex();
}
on this piece of text:
first line {first comment}
second line {multiline
comment}
I am getting this out put:
{comm}co{mcomm}
which seems to match only the letters c,o and m(this changes with changing the word com, it matches every letter of the word used to define the comment) but at the same time it includes curly brackets. I tried changing the test text but no success.
There are several problems with your scanner definition. The biggest one is that the pattern in this rule ...
[^{com}] ;
... doesn't mean at all what you seem to think it means. What you have in mind seems to be to ignore anything that does not match the regex to which com
expands, but the pattern in that rule doesn't mean anything remotely like that. It just matches and discards one character at a time that is not among {
, }
, c
, o
, m
.
The idiomatic way to handle input that is not matched by any (other) rule is to add a match-anything rule at the end of the rule list. That would bring you to something along these lines ...
com ^"\{"(.|\n)*"\}"$
%%
{com} printf("%s",yytext);
. /* discard a single character that otherwise is unmatched */;
%%
But then you will see that you have some secondary problems:
The definition of com
expands to a regular expression that is anchored to the beginning and end of a line. From your example input, you at least don't seem to want to anchor to the beginning of the line, but the fact that you have a closing delimiter at all suggests that you probably don't want to anchor to the end of the line, either.
The com
regex does not stop matching at the first closing brace. It will collect everything from the opening brace of the first comment to the closing brace of the last comment.
Additionally, as a matter of style, it is not necessary or idiomatic to both quote and escape the curly braces.
This variation, then, will serve your intended purposes better:
com \{[^}]*\}
%%
{com} printf("%s",yytext);
. ;
%%
That version of com
expands to a regular expression matching an opening curly brace ({
), anywhere, followed by any number of characters other than a closing curly brace (}
), followed by a closing curly brace, anywhere on the line.
You don't actually need the trivial main()
supplied in your original flex input, as linking with -lfl
provides an equivalent one, and the %noyywrap
is not essential for the purposes of this discussion, so that's a complete solution. Demo:
$ flex com.l
$ gcc -o com lex.yy.c -lfl
$ ./com <<'EOF'
first line {first comment}
second line {multiline
comment}
EOF
{first comment}
{multiline
comment}
Of course, that com
macro is now simple enough that, given it is used only once anyway, you could consider incorporating the regex directly into your rule instead of via the com
definition. But that's a stylistic matter that could go either way.
Also, if you wanted to strip the comments instead of the non-comments, or if you wanted to exclude the braces from the output comment text, then you could do that by making appropriate modifications to the actions of those rules.