Search code examples
bisonflex-lexer

Must my Flex code be invalid alone to be valid with Bison?


Every tutorial begins with Flex lexer only. And then they introduce Bison. I can run Flex and run Bison and compile just fine - I've written myself a shell script for it - but shouldn't I be able to generate and compile just from Flex as well? Or must I give up on that?

There are numerous errors had from omitting my Bison. Some sources include:

<tab>{FLA}  {yylval.midi = midi(yytext[1],yytext[0],1) ; return FLA;}

With error "undefined reference to `yylval'. And:

#include "y.tab.h"

Which cannot locate the .h file. I have taken to including sed regex in my compile script to generate a kinder .l file omitting these problematic things, which I can then flex into C code and compile alone, in order to keep track of my lexer which I develop next to my parser for my notation.

Is that something that people do? Is there a different way to keep the Flex code valid by itself? Or do people just give up on that?


Solution

  • Yes you can run flex code stand-alone, and for some purposes the kind of finite-state machine that flex generates is the perfect tool for the job and nothing further (like a context free grammar parser) would be needed. An example I give my students, is that processing some simple communication packets might only need a finite state machine, and rather than hand code it*, just use a tool like flex.

    However, when using flex and bison together I think it is good software engineering practise to perform unit testing on the flex component separate from the bison component. It saves hours of debugging when you have shaken down the lexer first. The technique I teach my students here is to use C macros and conditional compilation to separate the bison dependant code. Others may have other mechanisms that they prefer.

    Let's have an example. Say you have a simple language with integer constants and identifiers that get passed back to the parser by yylval as shown in your question. I do it this way:

    %{
    #ifdef PRINT
    #define TOKEN(token) printf("Token: " #token ": %s\n", yytext)
    #else
    #define TOKEN(token) yylval=SymbolTable(yytext); return(token)
    #endif
    %}
    
    identifier       [a-zA-Z][0-9a-zA-Z]*
    number           [0-9]+
    
    %%
    
    {identifier}     TOKEN(ID);
    {number}         TOKEN(NUMBER);
    

    Then I can build my stand-alone version this way:

    flex sample.l
    gcc -o lexer.exe lex.yy.c -lfl -DPRINT
    

    * I assume you know that a finite state machine is just a switch inside a loop...

    while (not <<EOF>>) do {
      switch (state) {
    
       state1:   ... break;
    
     }
    }