Search code examples
regexlexflex-lexer

How can I use lookbehind assertions in lex?


I require positive lookbehind assertions in lex (flex 2.5.35). After investigating the documentation, I don't see a direct way to do this. It has something similar to a lookahead assertion (the r/s syntax), but not lookbehind. What's the best way to achieve the same effect?

Here's an example: Say I have the folling rules in my scanner spec file:

a         printf("matched a ");
b         printf("matched b ");
c         printf("matched c ");
d         printf("matched d ");

How would I match a 'd' following a 'b', and the 'b' itself, so that on an input of 'abd' I would get:

matched a matched b matched d following b

But for a string 'acd'

matched a matched c matched d

The rule:

bd        printf("matched d following b ");

obviously doesn't work since it consumes the b; for 'abd' it outputs:

matched a matched d following b

If I had pcre lookbehinds I could write:

(?<=b)d   printf("matched d following b ");

and all would be well, but lex doesn't support this.


Solution

  • You can probably achieve what you want with start conditions, at the cost of a more complicated scanner. Start conditions allow you to conditionally enable rules based on what has matched previously. Here is an example:

    %x matched_b
    %%
    <INITIAL,matched_b>{
        a       { printf("matched a\n"); BEGIN(INITIAL); }
        b       { printf("matched b\n"); BEGIN(matched_b); }
        c       { printf("matched c\n"); BEGIN(INITIAL); }
    }
    
    d       printf("matched d\n");
    <matched_b>d    { printf("matched d following b\n"); BEGIN(INITIAL); }
    

    With this scanner I get:

    $ echo abcd | ./5615080
    matched a
    matched b
    matched c
    matched d
    
    $ echo abdd | ./5615080
    matched a
    matched b
    matched d following b
    matched d