Search code examples
yacclex

How do I make lex/yacc match strings longer than 9000?


'[^']*\' I use this rule to make lex match strings, it works fine when the string length is less than 9000, so how do I get lex to match strings longer than 9,000

Whether I should change the rules? Or do I have to set something up? I wish someone could help me


Solution

  • You can change states from the predefined INITIAL state to some other state, SQSTR, when you encounter '. Within the SQSTR state, you switch back to INITIAL when you encounter an unescaped '. Otherwise, you stay in SQSTR and append characters to the token. How you optimally manage errors and string growth wrt memory allocation is an exercise left to the reader. Multi-line strings are also straightforward. And, of course, you should recognize an obvious refactoring opportunity which should be glaring red if you try to add multi-line string support.

    %s SQSTR
    %%
    
    %{
       char *str;
       int len;
    %}
    
    <INITIAL>' {
        str = malloc(1);
        len = 0;
        *str = 0;
        BEGIN(SQSTR);
    }
    <SQSTR>\\' {
        str = realloc(str, len+1);
        str[len] = '\'';
        str[len+1] = 0;
        len++;
    }
    <SQSTR>' {
        printf("length of str is %d. First 10 is '%.10s' and last 10 are '%s'", len, str, len>=10 ? str+len-10 : str);
        BEGIN(INITIAL);
    }
    <SQSTR>. {
        str = realloc(str, len+1);
        str[len] = *yytext;
        str[len+1] = 0;
        len++;
    }
    
    int yywrap () {
        return 1;
    }
    
    int main (int argc, char *argv[]) {
        yylex();
    }
    
    $ wc bigger
           1       5   16337 bigger
    
    $ flex t.l && gcc -g lex.yy.c && ./a.out < bigger
    length of str is 16334. First 10 is 'aaaaaaaaaa' and last 10 are 'aaaaaaaaaa'
    

    Edit #1 In the original post, I mistakenly placed the more general . rule before the ' rule. Silly me.

    Edit #2 Add main and debugging