Search code examples
bisonflex-lexeryacc

yyg->yy_buffer_stack points to garbage after yylex_init


After some hours of debugging, I found that yyg->yy_bufer_stack points into some memory that it's not supposed to, and it later leads to segmentation fault:

(gdb) print yyg->yy_buffer_stack ? yyg->yy_buffer_stack[yyg->yy_buffer_stack_top] : 0
Cannot access memory at address 0x2aaaaaab0c680
(gdb) print yyg->yy_buffer_stack
$8 = (YY_BUFFER_STATE *) 0x40000

When I try reading the code for generated lexer, it is full of obscure and very suspicious references like negative indexing, intentional initialization of variables to garbage and so on... It's not at all obvious how this could've been set to this bizarre value. Any guesses as to what might have caused it? Below is my code where I'm calling to yyparse:

int main(int argc, char** argv) {
    int res;
    if (argc == 2) {
        yyscan_t yyscanner;

        yylex_init(&yyscanner);
        FILE* h = fopen(argv[1], "rb");

        if (h == NULL) {

            fprintf(stderr, "Couldn't open: %s\n", argv[1]);
            return errno;
        }
        yyset_in(h, yyscanner);
        fprintf(stderr, "Scanner set\n");
        res = yyparse(&yyscanner);
        fprintf(stderr, "Parsed\n");
        yylex_destroy(&yyscanner);
        return res;
    }
    if (argc > 2) {
        fprintf(stderr, "Wrong number of arguments\n");
    }
    print_usage();
    return 1;
}

Solution

  • You are calling yyparse with the address of yyscanner but you're supposed to call it with the value. The address of yyscanner is somewhere in the C stack, whereas the yyscanner argument is expected to point to a block of memory in which yylex stores its persistent state. That obviously isn't going to work.

    Note the example from the Flex manual

    int main ( int argc, char * argv[] )
        {
            yyscan_t scanner;
    
            yylex_init ( &scanner );
            yylex ( scanner );
            yylex_destroy ( scanner );
            return 0;
        }