Search code examples
cyacclex

How to match optional token in lex


I have file with strings as below.

PORT = en, PIN = P3; 
PORT = dummy[9], PIN = P41;
PORT = dummy[8], PIN = P42;
PORT = dummy[7], PIN = P43;
PORT = dummy[6], PIN = P44;
PORT = dummy[5], PIN = P45;
PORT = dummy[4], PIN = P46;
PORT = dummy[3], PIN = P47;
PORT = dummy[2], PIN = P48;
PORT = dummy[1], PIN = P49;
PORT = dummy[0], PIN = P50;
PORT = out1, PIN = P6; 

I'm trying to extract PORT and PIN using lex as below.

lex grammer.

%%
"="                    { return EQUALS; }
","                    { return COMMA; }
";"                    { return SEMICOLON; }
PORT                   { return PORT; }
PIN                    { return PIN; }
[\[0-9\]]* {yylval.str =strdup(yytext);return BUS_PORT;}
[a-zA-Z_][a-zA-Z0-9_]* {yylval.str =strdup(yytext);return ALPHANUMERIC;}
"//".* | [\t]          {; }
"/*"[.\n]*"*/"         {; }
\n                     {; }
.                      {; }
%%

And respective yacc file.

%token EQUALS
%token COMMA
%token SEMICOLON
%token PIN
%token PORT
%token <str> ALPHANUMERIC
%token <str> BUS_PORT
%type <str> port_name
%type <str> pin_name

%%

physical_command : sub_command
                 | physical_command sub_command
                 ;
sub_command      : port_command
                 ;

port_command     : PORT EQUALS port_name COMMA PIN EQUALS pin_name SEMICOLON
                 {
                   pm->addPortAndPin(std::string($3),std::string($7));
                 }
                 ;

port_name        : ALPHANUMERIC
                 | ALPHANUMERIC BUS_PORT
                 {
                   $$ = $1;
                 }
                 ;
pin_name         : ALPHANUMERIC
                 {
                   $$ = $1;
                 }
                 ;
%%

If you see port name can be an array type {dummy[10], dummy[9]..etc} or normal type. To parse it I have written rule as below.

port_name        : ALPHANUMERIC //for normal type
                 | ALPHANUMERIC BUS_PORT  //for array type
                 {
                   $$ = $1;
                 }

And grammar is

[\[0-9\]]* {yylval.str =strdup(yytext);return BUS_PORT;}
[a-zA-Z_][a-zA-Z0-9_]* {yylval.str =strdup(yytext);return ALPHANUMERIC;}

My Question:

I'm not able to parse array type with the above rules, my output looks like below. Please help me with the rules such that I can parse normal and array type.

Port = en       Pin = P3
Port = dummy    Pin = P41 //should have been dummy[9]
Port = dummy    Pin = P42 //should have been dummy[8]
Port = dummy    Pin = P43
Port = dummy    Pin = P44
Port = dummy    Pin = P45
Port = dummy    Pin = P46
Port = dummy    Pin = P47
Port = dummy    Pin = P48
Port = dummy    Pin = P49
Port = dummy    Pin = P50
Port = out1     Pin = P6

Solution

  • Array element access operation where every opening '[' should have the corresponding closing ']' is handled at parsing level.

    In lex specification,replace

    [\[0-9\]]* {yylval.str =strdup(yytext);return BUS_PORT;}
    

    by

      [0-9]+   {}
       "["     { return *yytext;} 
       "]"     { return *yytext;}
    

    In parser specification, replace

      port_name  : ALPHANUMERIC
                 | ALPHANUMERIC BUS_PORT { $$ = $1;}
    

    by

      port_name  : ALPHANUMERIC
                 | ALPHANUMERIC '[' BUS_PORT ']' { $$ = $1;}
    

    One more solution is, change BUS_PORT token rule to

      \[[0-9]+\] {yylval.str =strdup(yytext);return BUS_PORT;}
    

    It recognizes [ followed by one or digits followed by ], works only for fixed-number of dimensions