Search code examples
bisonflex-lexer

BISON+FLEX using short form of tokens


I'd like to implement some command lang ... Is there a way to implement token reconginizing to get token for "CREATE" :

CREATE  
CRE
CREA
CREAT

another example:

DELE
DEL
DELET
DELETE

for token "DELETE"

I know way like :

"CREATE" { return KWD_CREATE;}
"CRE"    { return KWD_CREATE;}


"DEL"     { return KWD_DELETE;}
"DELET"   { return KWD_DELETE;}

But, is there a right way to recognize reduced form of keywords ?

Update: I have tried the suggested trick like:

CRE(A(T(E?)?)?   { return KWD_CREATE;}
DEL(E(T(E?)?)?   { return KWD_DELETE;}

But next problem is take place:

CREATE - is recognized
CREAT - is recognized
CREA - is **not** recognized

I see "syntax error, unexpected id", id it's identifier pattern as follow:

identifier  [$_a-zA-Z][$_a-zA-Z0-9\%\*]*

Any idea? What's im need to check additionaly ?

Thanks!


Solution

  • There's no shorthand for this syntax, but you can simply use, for example:

    CRE(A(TE?)?)?   { return KWD_CREATE;}
    DEL(E(TE?)?)?   { return KWD_DELETE;}
    

    That would be easy enough to do programmatically if you were generating your lexer with some kind of generator-generator (a technique I find quite useful).

    Test:

    $ cat abbrev.l
    %option noinput nounput noyywrap nodefault 8bit
    %%
    cre(a(te?)?)?   { fprintf(stderr, "%s\n", "CREATE"); }
    del(e(te?)?)?   { fprintf(stderr, "%s\n", "DELETE"); }
    [[:alpha:]]+    { fprintf(stderr, "WORD: %s\n", yytext); }
    [[:space:]]+    ;
    .               { fprintf(stderr, "PUNC: %c\n", *yytext); }
    $ flex -o abbrev.c abbrev.l
    $ gcc -Wall -o abbrev abbrev.c -lfl
    $ ./abbrev
    create
    CREATE
    creat
    CREATE
    crea
    CREATE
    cre
    CREATE
    cr
    WORD: cr
    delete
    DELETE
    delet
    DELETE
    dele
    DELETE
    del
    DELETE
    de
    WORD: de