Search code examples
cbisonyacclexlexical-analysis

in lex and yacc code printf not working in yacc file


Hi i am trying to run John code from lex and yacc book by R. Levine i have compiled the lex and yacc program in linux using the commands

lex example.l
yacc example.y
gcc -o example y.tab.c
./example

the program asks the user for input of verbs,nouns,prepositions e.t.c in the format

verb accept,admire,reject
noun jam,pillow,knee

and then runs the grammar in yacc to check if it's a simple or compound sentence but when i type jam reject knee

it shows noting on screen where it is supposed to show the line "Parsed a simple sentence." on parsing.The code is given below

yacc file

%{
#include <stdio.h>
/* we found the following required for some yacc implementations. */
/* #define YYSTYPE int */
%}

%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION

%%

sentence: simple_sentence   { printf("Parsed a simple sentence.\n"); }
    | compound_sentence { printf("Parsed a compound sentence.\n"); }
    ; 

simple_sentence: subject verb object
    |   subject verb object prep_phrase
    ;

compound_sentence: simple_sentence CONJUNCTION simple_sentence
    |   compound_sentence CONJUNCTION simple_sentence
    ;

subject:    NOUN
    |   PRONOUN
    |   ADJECTIVE subject
    ;

verb:       VERB
    |   ADVERB VERB
    |   verb VERB
    ;

object:     NOUN
    |   ADJECTIVE object
    ;

prep_phrase:    PREPOSITION NOUN
    ;

%%

extern FILE *yyin;

main()
{
    while(!feof(yyin)) {
        yyparse();
    }
}

yyerror(s)
char *s;
{
    fprintf(stderr, "%s\n", s);
}

lex file

%{
/*
 * We now build a lexical analyzer to be used by a higher-level parser.
 */

#include "ch1-06y.h"    /* token codes from the parser */

#define LOOKUP 0 /* default - not a defined word type. */

int state; 

%}

%%

\n  { state = LOOKUP; }

\.\n    {   state = LOOKUP;
        return 0; /* end of sentence */
    }

^verb   { state = VERB; }
^adj    { state = ADJECTIVE; }
^adv    { state = ADVERB; }
^noun   { state = NOUN; }
^prep   { state = PREPOSITION; }
^pron   { state = PRONOUN; }
^conj   { state = CONJUNCTION; }

[a-zA-Z]+ { 
         if(state != LOOKUP) {
            add_word(state, yytext);
         } else {
        switch(lookup_word(yytext)) {
        case VERB:
          return(VERB);
        case ADJECTIVE:
          return(ADJECTIVE);
        case ADVERB:
          return(ADVERB);
        case NOUN:
          return(NOUN);
        case PREPOSITION:
          return(PREPOSITION);
        case PRONOUN:
          return(PRONOUN);
        case CONJUNCTION:
          return(CONJUNCTION);
        default:
          printf("%s:  don't recognize\n", yytext);
          /* don't return, just ignore it */
        }
            }
          }

.   ; 

%%
/* define a linked list of words and types */
struct word {
    char *word_name;
    int word_type;
    struct word *next;
};

struct word *word_list; /* first element in word list */

extern void *malloc();

int
add_word(int type, char *word)
{
    struct word *wp;    

    if(lookup_word(word) != LOOKUP) {
        printf("!!! warning: word %s already defined \n", word);
        return 0;
    }

    /* word not there, allocate a new entry and link it on the list */

    wp = (struct word *) malloc(sizeof(struct word));

    wp->next = word_list;

    /* have to copy the word itself as well */

    wp->word_name = (char *) malloc(strlen(word)+1);
    strcpy(wp->word_name, word);
    wp->word_type = type;
    word_list = wp;
    return 1;   /* it worked */
}

int
lookup_word(char *word)
{
    struct word *wp = word_list;

    /* search down the list looking for the word */
    for(; wp; wp = wp->next) {
        if(strcmp(wp->word_name, word) == 0)
            return wp->word_type;
    }

    return LOOKUP;  /* not found */
}

header file

# define NOUN 257
# define PRONOUN 258
# define VERB 259
# define ADVERB 260
# define ADJECTIVE 261
# define PREPOSITION 262
# define CONJUNCTION 263

Solution

  • You have several problems:

    1. The build details you describe do not follow the usual pattern, and in fact they do not work for the code you provide.

    2. Having sorted out how to build your program, it does not work at all, instead segfaulting before reading any input.

    3. Having solved that problem, your expectation of the program's behavior with the given input is incorrect in at least two ways.

    With respect to the build:

    • yacc builds C source for a parser and optionally a header file containing corresponding token definitions. It is usual to exercise the option to get the definitions, and to #include their header in the lexer's source file (#include 'y.tab.h'):

      yacc -d example.y

    • lex builds C source for a lexical analyzer. This can be done either before of after yacc, as lex does not depend directly on the token definitions:

      lex example.l

    • The two generated C source files must be compiled and linked together, possibly with other sources as well, and possibly with libraries. In particular, it is often convenient to link in libl (or libfl if your lex is really GNU flex). I linked the latter to get the default yywrap():

      gcc -o example lex.yy.c y.tab.c -lfl

    With respect to the segfault:

    Your generated program is built around this:

    extern FILE *yyin;
    
    main()
    {
        while(!feof(yyin)) {
            yyparse();
        }
    }
    

    In the first place, you should read Why is “while ( !feof (file) )” always wrong?. Having had that under consideration might have spared you from committing a much more fundamental mistake: evaluating yyin before it has been set. Although it's true that yyin will be set to stdin if you don't set it to something else, that cannot happen at program initialization because stdin is not a compile-time constant. Therefore, when control first reaches the loop control expression, yyin's value is still NULL, and a segfault results.

    It would be safe and make more sense to test for end of file after yyparse() returns.

    With respect to behavioral expectations

    You complained that the input

    verb accept,admire,reject
    noun jam,pillow,knee
    jam reject knee
    

    does not elicit any output from the program, but that's not exactly true. That input does not elicit output from the program when it is entered interactively, without afterward sending an end-of-file signal (i.e. by typing control-D at the beginning of a line).

    The parser not yet having detected end-of-file in that case (and not paying any attention at all to newlines, since your lexer notifies it about them only when they immediately follow a period), it has no reason to attempt to reduce its token stack to the start symbol. It could be that you will continue with an object to extend the simple sentence, and it cannot be sure that it won't see a CONJUNCTION next, eithger. It doesn't print anything because it's waiting for more input. If you end the sentence with a period or afterward send a control-D then it will in fact print "Parsed a simple sentence."