Search code examples
segmentation-faultbisoninterpreterflex-lexerlex

Repetitive lexical analyser, error segmentation fault


I am working on a interpreter, but I have some problems.

In my lex:

<INITIAL>\{                 {BEGIN(BLOC);}
<BLOC>[^}]*\}               {BEGIN(INITIAL);strncpy(yylval.sval, yytext, MAXVARSIZE);
                            temp = strlen(yylval.sval);
                            yylval.sval[temp-1] = '\0';
                            return BLOCK;}

lex returned a block between {}, and in my bison parser, I set the flex buffer:

ifs:
    IF PAREOPEN condition PARECLOSE BLOCK {if($3 > 0){scan_string($5);}}

;
[...]

void scan_string(const char* str)
{
    yy_switch_to_buffer(yy_scan_string(str));

}

int main(int argc, char *argv[]) {
    yyin = stdin;
            do { 
        printf("aqui2\n");
        yyparse();

    } while(!feof(yyin));

}

But bison later produces a segmentation fault. I want to restore the buffer to yyin as it was originally.


Solution

  • I don't think this approach will work. See below.

    You should not call yy_switch_to_buffer; yy_scan_string does that automatically. Also, in order to switch back to yyin, you'll need to have an <<EOF>> rule which detects the end of file indication (or end of buffer, in this case) and switches back to yyin. In order to make sure that you preserve the original buffer, you'll need to keep YY_CURRENT_BUFFER in some temporary variable and call yy_switch_to_buffer on that temporary; if you create a new buffer from yyin, you'll lose any buffered input.

    A simpler way to manage a stack of input buffers is to use a buffer stack; you can then call yypush_buffer_state to start scanning the new buffer, and yypop_buffer_state in your <<EOF>> rule. However, there is an odd interaction between yy_scan_string and yypush_buffer_state, which requires you to first push a copy of the current buffer and then replace it with the buffer state created by yy_scan_string. See this answer for an example. (You might want to read the relevant section of the Flex manual, which has a complete example although it is for nesting files, not strings.)

    Without seeing more of your code, it's hard to know where the segfault comes from. It could be an error in your <<EOF>> handler, which you don't show. It could also be related to the handling of yylval.sval; if that is a pointer to a buffer (i.e. a char*), then it is apparently not initialized anywhere, which is likely to produce an error when you strncpy into it.

    But it seems to me most likely that you have included a fixed-length character array as part of your semantic value union. That's a really bad idea for a number of reasons, not least of which is that it wastes an awful lot of space in the parser stack: every entry will include the fixed-length buffer. Also, fixed-length buffers are never a good idea; you can easily overflow them.

    In this case, it cannot work at all because the array will be part of a bison parser stack entry, and that entry will be removed from the stack as soon as the ifs action terminates. That will leave Flex's buffer with a dangling pointer, which is almost certain to create problems.

    So you could try the following:

    • Change the sval member in the union to char* (which will require a variety of changes in your code)

    • Replace the <BLOC> pattern in your flex file with something like this;:

      <INITIAL>\{         { BEGIN(BLOC); }
      <BLOC>[^}]*\}       { BEGIN(INITIAL);
                            yylval.sval = malloc(yyleng);
                            memcpy(yylval.sval, yytext, yyleng - 1);
                            yylval.sval[yyleng - 1] = '\0';
                            return BLOCK;
                          }
      
    • Change the ifs action in your bison parser so that it frees the dynamically allocated string buffer (it can do that because yy_scan_string makes a copy).

    • Add an <<EOF>> rule and modify scan_string to use a buffer stack, as above.

    Even with all that, I don't think this is a very good strategy. Changing the flex buffer in the middle of a bison action will only work if the parser has not read a lookahead token, which makes it very fragile. (And it won't work at all with other yacc-like parser generators, which always read a lookahead token.) And it is not obvious how this might end up working with nested blocks, which you will probably want to implement at some point.