Search code examples
c++bisonflex-lexerparser-generator

Detect conditions of an if statements using Bison and Flex in C++


I want to get all the conditions of an if statement in C++. If I put (foo&&bar&&(one&&two)), then I want to print foo - bar - one - two.

I've compiled the scanner.l and parser.y files, and tested individually: my yy.lex.c works: if I put (a&&b), then I get 5 tokens: (, a, &&, b and ) as I want. But when I'm using the .y file, if I put the same input, then I get a&&b and b). Note that in this case I get 2 tokens, because the token a&&b should be separated into 3 tokens a, && and b . I've tried introducing a simpler condition: (a) then I get: ( and a) but I want to get (, a, ).

I don't know if I'm doing something wrong or is a bug; I hope it is my fault.

parser.y

%{
    #include <iostream>
    #include <list>
    #include <stdio.h>
    #include <sstream>
    #include <string>

    using namespace std;

    int yylex(void);
    void yyerror(char *);

    list<string> tokenList;

    #define YYSTYPE char *
%}

%token  PAR_IZQ
        PAR_DER
        SIMBOLO
        FIN
        NADA
        AND
        OR

%start input

%%

input:

    |   input terminos
;

terminos:
        PAR_IZQ terminos PAR_DER    { }
    |   PAR_IZQ condicion PAR_DER   { }
;

condicion:
        terminos AND terminos       { }
    |   SIMBOLO AND terminos        { cout << " 1) CONDITION FOUND: " << $1 << endl; }
    |   terminos AND SIMBOLO        { cout << " 2) CONDITION FOUND: " << $3 << endl; }
    |   SIMBOLO AND SIMBOLO         { cout << " 3) CONDITION FOUND: " << $3 << " AND " << $1 << endl; }
    |   SIMBOLO                     { cout << " 4) CONDITION FOUND: " << $1 << endl; }
;

%%

void yyerror(char *s) {
    fprintf(stderr, "%s\n", s);
}

int main(void) {
    yyparse();
    return 0;
}

scanner.l

%option noyywrap
%{
    #include <iostream>
    #include "parser.tab.c"
    using namespace std;
%}

%%

[a-zA-Z0-9]+  {
    yylval = yytext;
    return SIMBOLO;
}

"&&" {
    return AND;
}

"||" {
    return OR;
}

[ \0\0] {
    return FIN;
}

"("     {
    yylval = yytext;
    return PAR_IZQ;
}

")"     {
    yylval = yytext;
    return PAR_DER;
}

.       {
    cout << "Entrada no permitida.";
    cout << endl << yytext << endl;
    exit(1);
}

%%

main.cpp

#include "mainwindow.h"
#include <QApplication>
#include "lex.yy.c"
#include <iostream>
#include <vector>
#include <string>

using namespace std;

typedef yy_buffer_state *YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_buffer(char *, size_t);

int main(int argc, char** argv) {

    char condition[] = "(a) \0\0";
    // note yy_scan_buffer is looking for a double null string
    yy_scan_buffer(condition, sizeof(condition));
    yyparse();
    return 0;
}

Please note that the input of the parser is a string, so I have to pass it as a paramenter of the yy_scan_buffer function. The \0\0 is necessary because otherwise the program would never end its execution. The console says in red "syntax error", but I don't know why!

enter image description here

Do yo know how can I fix it and get the tokens that I want?

Thanks!

PS: I'm using Windows 8.1, Qt creator 2.8.1, win_bison 2.7 and win_flex 2.5.37.


Solution

  • The problem is that your lexer returns yytext to the parser as the yylval associatated with symbol tokens, but yytext is a pointer into the lexer's internal token buffer, which will only be valid until the next token is read. So when you later go to print out the tokens in your condition, you get semi-random garbage (since it is soon after, you mostly just get the same raw token buffer with no terminating null characters).

    You need to make a copy of the string pointed at by yytext before returning it to the parser:

    [a-zA-Z0-9]+  {
        yylval = strdup(yytext);
        return SIMBOLO;
    }
    

    of course, you also need to keep track of when the string is no longer needed and free it.

    Alternately, since you're not using %union, change your #define for YYSTYPE to

    #define YYSTYPE  std::string