Search code examples
network-programminggcclexflex-lexer

Regular expression for a tcp/udp port recognition (16-bits)


I have a lex file port_regex.l that contains the following code.

DECIMAL_16bits [ \t]*[:digit:]{1,4}[ \t]*
SPACE [ \t]

%x S_rule S_dst_port

%%

%{
    BEGIN S_rule;
%}

<S_rule>(dst-port){SPACE}   {
           BEGIN(S_dst_port);
        }

<S_dst_port>\{{DECIMAL_16bits}\}  {
       printf("\n\nMATCH [%s]\n\n", yytext);
       BEGIN S_rule;
     }

. { ECHO; }

%%

int main(void)
{
    while (yylex() != 0)
        ;
    return(0);
}

int yywrap(void)
{
    return 1;
}

I create an executable from it as follows.

flex port_regex.l
gcc lex.yy.c -o port_regex

which creates an executable called port_regex.

I have a file that contains test data called port.file which is given below.

dst-port {234}
dst-port {236}
dst-port {233}
dst-port {2656}

How do I test the port.file using port_regex executable.

can I do something like

./port_regex < port.file

I tried the above and it doesn't seem to work??


Solution

  • So long as your application doesn't become a lot more complex, I think using start conditions is a good way to go, instead of introducing a yacc-generated parser.

    A couple of thoughts:

    The examples I see sometimes use parentheses with BEGIN (BEGIN(comment)) and sometimes not (BEGIN comment). I doubt that it makes any difference, but you should be consistent.

    The book says that the default rule to echo unmatched characters is still in effect, even under exclusive start conditions, so you shouldn't need

    . { ECHO; }
    

    and since your start conditions are exclusive, it wouldn't fire anyway. Just to make sure, you might rewrite it as

    <*>.|\n     ECHO;