Search code examples
parsingbisonflex-lexerjison

Getting tokens based on length and position inside input


On my input I have stream of characters which are not separated by any delimiter, like this:

input = "150001"

I want to make parser(using JISON), which tokenize based on position and length, this should be my tokens:

15 - system id (first 2 numbers)
0001 - order num (4 numbers after)

Can you give me some advice how can I accomplish this, I tried to add my tokens like this:

    %lex
    %%

     [0-9]{2}    return "SYSTEM_ID"
     [0-9]{4}    return "ORDER_NUM"

   \lex
   %%

But as expected this is not working :)

Is there some way to parse this kind of inputs, where you parse by length of characters ?


Solution

  • You can make a simple parser using state-declarations, and assigning a state to each of those rules. Referring to JISON's documentation, it would change to something like this (noting that your lexer is still incomplete because it does nothing for the identifier or "="):

    %lex 
    %s system_id order_num
    %%
     /* some more logic is needed to accept identifier, then "=", each
        with its own state, and beginning "system_id" state.
      */
     <system_id>[0-9]{2}    this.begin("order_num"); return "SYSTEM_ID"
     <order_num>[0-9]{4}    this.begin('INITIAL'); return "ORDER_NUM"
    

    \lex %%