Search code examples
cwindowsdebuggingbisonflex-lexer

Flex and Bison, Windows Error using symbol table


Program is intended to store values in a symbol table and then have them be able to be printed out stating the part of speech. Further to be parsed and state more in the parser, whether it is a sentence and more.

I create the executable file by

flex try1.l
bison -dy try1.y
gcc lex.yy.c y.tab.c -o try1.exe

in cmd (WINDOWS)

My issue occurs when I try to declare any value when running the executable, verb run it goes like this BOLD IS INPUT

verb run
run
run

syntax error
noun cat
cat

syntax error
run
run

syntax error
cat run
syntax error

MY THOUGHTS: I'm unsure why I'm getting this error back from the code Syntax error. Although after debugging and trying to print out what value was being stored, I figured there has to be some kind of issue with the linked list. As it seemed only one value was being stored in the linked list and causing an error of sorts. As I tried to print out the stored word_type integer value for run and it would print out the correct value 259, but would refuse to let me define any other words to my symbol table. I reversed the changes of the print statements and now it works as previously stated. I think again there is an issue with the addword method as it isn't properly being added so the lookup method is crashing the program.

Lexer file, this example is taken from O'Reily 2nd edition on Lex And Yacc, Example 1-5,1-6. Am trying to learn Lex and Yacc on my own and reproduce this example.

%{
/*
* We now build a lexical analyzer to be used by a higher-level parser.
*/
#include <stdlib.h>
#include <string.h>
#include "ytab.h" /* token codes from the parser */
#define LOOKUP 0 /* default - not a defined word type. */
int state;
%}
/* 
*  Example from page 9 Word recognizer with a symbol table. PART 2 of Lexer
*/
%%
\n { state = LOOKUP; } /* end of line, return to default state */
    \.\n { state = LOOKUP;
    return 0; /* end of sentence */
    }
        /* whenever a line starts with a reserved part of speech name */
        /* start defining words of that type */
        ^verb { state = VERB; }
        ^adj { state = ADJ; }
        ^adv { state = ADV; }
        ^noun { state = NOUN; }
        ^prep { state = PREP; }
        ^pron { state = PRON; }
        ^conj { state = CONJ; }
        [a-zA-Z]+ {
            if(state != LOOKUP) {
            add_word(state, yytext);
            } else {
                switch(lookup_word(yytext)) {
                case VERB:
                return(VERB);
                case ADJECTIVE:
                return(ADJECTIVE);
                case ADVERB:
                return(ADVERB);
                case NOUN:
                return(NOUN);
                case PREPOSITION:
                return(PREPOSITION);
                case PRONOUN:
                return(PRONOUN);
                case CONJUNCTION:
                return(CONJUNCTION);
                default:
                printf("%s: don't recognize\n", yytext);
                /* don't return, just ignore it */
                    }
                    }
               }
        . ;
%%
int yywrap()
{
return 1;
}
/* define a linked list of words and types */
struct word {
        char *word_name;
        int word_type;
        struct word *next;
};
struct word *word_list; /* first element in word list */
extern void *malloc() ;
int
add_word(int type, char *word)
    {
    struct word *wp;
        if(lookup_word(word) != LOOKUP) {
        printf("!!! warning: word %s already defined \n", word);
        return 0;
        }
    /* word not there, allocate a new entry and link it on the list */
    wp = (struct word *) malloc(sizeof(struct word));
    wp->next = word_list;
    /* have to copy the word itself as well */
    wp->word_name = (char *) malloc(strlen(word)+1);
    strcpy(wp->word_name, word);
    wp->word_type = type;
    word_list = wp;
    return 1; /* it worked */
    }
    int
    lookup_word(char *word)
    {
    struct word *wp = word_list;
    /* search down the list looking for the word */
    for(; wp; wp = wp->next) {
        if(strcmp(wp->word_name, word) == 0)
        return wp->word_type;
    }
return LOOKUP; /* not found */
}

Yacc file,

%{
/*
* A lexer for the basic grammar to use for recognizing English sentences.
*/
#include <stdio.h>
%}
%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION
%%
sentence: subject VERB object{ printf("Sentence is valid.\n"); }
;
subject: NOUN
| PRONOUN
;
object: NOUN
;
%%
extern FILE *yyin;
main()
{
do
{
yyparse();
}
while (!feof(yyin));
}
yyerror(s)
char *s;
{
fprintf(stderr, "%s\n", s);
}

Header file, had to create 2 versions for some values not sure why but code was having an issue with them, and I wasn't understanding why so I just created a token with the full name and the shortened as the book had only one for each.

# define NOUN 257
# define PRON 258
# define VERB 259
# define ADVERB 260
# define ADJECTIVE 261
# define PREPOSITION 262
# define CONJUNCTION 263
# define ADV 260
# define ADJ 261
# define PREP 262
# define CONJ 263
# define PRONOUN 258

Solution

    1. If you feel that there is a problem with your linked list implementation, you'd be a lot better off testing and debugging it with a simple driver program rather than trying to do that with some tools (flex and bison) which you are still learning. On the whole, the simpler a test is and the fewest dependencies which it has, the easier it is to track down problems. See this useful essay by Eric Clippert for some suggestions on debugging.

    2. I don't understand why you felt the need to introduce "short versions" of the token IDs. The example code in Levine's book does not anywhere use these symbols. You cannot just invent symbols and you don't need these abbreviations for anything.

      The comment that you "had to create 2 versions [of the header file] for some values" but that the "code was having an issue with them, and I wasn't understanding why" is far too unspecific for an answer. Perhaps the problem was that you thought you could use identifiers which are not defined anywhere, which would certainly cause a compiler error. But if there is some other issue, you could ask a question with an accurate problem description (that is, exactly what problem you encountered) and a Minimal, Complete, and Verifiable example (as indicated in the StackOverflow help pages).

      In any case, manually setting the values of the token IDs is almost certainly preventing you from being able to recognized inputs. Bison/yacc reserves the values 256 and 257 for internal tokens, so the first one which will be generated (and therefore used in the parser) has value 258. That means that the token values you are returning from your lexical scanner have a different meaning inside bison. Bottom line: Never manually set token values. If your header isn't being generated correctly, figure out why.

    3. As far as I can see, the only legal input for your program has the form:

      sentence: subject VERB object
      

      Since none of your sample inputs ("run", for example) have this form, a syntax error is not surprising. However, the fact that you receive a very early syntax error on the input "cat" does suggest there might be a problem with your symbol table lookup. (That's probably the result of the problem noted above.)