Search code examples
cparsingbisonyacclex

How can we define rules for identifying a certain sequence in a given series?


I would like to know if is possible identify a certain sequence inside given a series.

lex produces three different tokens: START, AMINO, STOP. I want to identify, by YACC, all the sequences which start with START, having a series of AMINO tokens, and end with STOP. Example: START AMINO AMINO ... AMINO STOP

I have never used YACC/bison before, so I have tried:

%%
seq_2: START seq_1 STOP {printf("%s", $2);};
seq_1: seq_1 AMINO
%%

But these rules don't work.

  • Is it possible (and convenient) trying to solve this issue by lex and YACC?
  • If it is possible, which could be a good way to solve it?

Solution

  • You're probably getting an error because there's no semicolon at the end of your 'seq_1' rule. E.g:

    seq_1 : seq_1 AMINO ;
    

    Also, as you currently have it it's impossible for seq_1 to ever terminate. You can fix that by giving it an additional rule which is terminal.

    If it is valid for 'seq_1' to be empty then you can do that as follows:

    seq_1 : seq_1 AMINO ;
    seq_1 : ;
    

    Or, as it is more typically written:

    seq_1 : seq_1 AMINO
          |
          ;
    

    If there should always be at least one AMINO between START and STOP then do it this way:

    seq_1 : AMINO
          | seq_1 AMINO
          ;