Search code examples
grammarbisonyacc

In Bison (or yacc for that matter), is there an order defined by a grammar?


I have the following grammar in a Bisone file:

item
    : "ITEM" t_name t_type v_storage t_prefix t_tag ';'
    ;
t_name
    : [$_A-Za-z][$_A-Z0-9a-z]*
    ;
t_type
    : "BYTE"
    | "WORD"
    | "LONG"
    | "QUAD"
    ;
v_storage
    : %empty
    | "TYPEDEF"
    ;
t_prefix
    : %empty
    | "PREFIX" t_name
    ;
t_tag
    : %empty
    | "TAG" t_name
    ;

When I attempt to parse the following string ITEM foobar BYTE PREFIX str_ TAG S TYPEDEF; I get an unexpected 'TYPEDEF" and it accepts the ";". Is there something I need to do to allow any order to be specified? If so, I'm hoping that there is a simple solution. Otherwise, I'll need to do a little more work.


Solution

  • It is not possible to tell bison (or yacc) that order doesn't matter. Rules are strictly ordered.

    So you have two options:

    1. List all possible orders. If you do this, watch out for ambiguities caused by optional productions. You'll actually need to list all orders and subsets. That mounts up exponentially.

    2. Just accept any list of components, as a list. That will accept repeated components so you'll need to catch that in the semantic action if you care.

    The second option is almost always the one you want. Implementation is usually trivial, because you will want to store the components somewhere; as long as that somewhere has a unique value (such as NULL) which means "not yet set", then you only need to test that value before setting it. For example rather than the one in the question):

    %{
       #include <stdbool>
       enum Type { 
         TYPE_DEFAULT = 0, TYPE_BYTE, TYPE_WORD, TYPE_LONG, TYPE_QUAD
       };
       typedef struct Item Item;
       struct Item {
         const char *name;
         enum Type   type;
         int         storage; /* 0: unset, 1: TYPEDEF */
         const char *prefix;
         const char *tag;
      };
      // ...
      // Relies on the fact that NULL and 0 are converted to boolean 
      // false. Returns true if it's ok to do the set (i.e. thing
      // wasn't set).
      bool check_dup(bool already_set, const char* thing) {
        if (already_set) 
          fprintf(stderr, "Duplicate %s ignored at line %d\n", thing, yylineno);
        return !already_set;
      }
    %}
    
    %union {
       const char *str;
       Item  *item;
       // ...
    }
    
    %type <item> item item-def
    %token <str> NAME STRING
    
    %%
    /* Many of the actions below depend on $$ having been set to $1.
     * If you use a template which doesn't provide that guarantee, you
     * will have to add $$ = $1; to some actions.
     */
    item: item-def { /* Do whatever is necessary to finalise $1 */ }
    item-def
        : "ITEM" NAME
                   { $$ = calloc(1, sizeof *$$); $$->name = $2; }
        | item-def "BYTE"
                   { if (check_dup($$->type, "type") $$->type = TYPE_BYTE; }
        | item-def "WORD"
                   { if (check_dup($$->type, "type") $$->type = TYPE_WORD; }
        | item-def "LONG"
                   { if (check_dup($$->type, "type") $$->type = TYPE_LONG; }
        | item-def "QUAD"
                   { if (check_dup($$->type, "type") $$->type = TYPE_QUAD; }
        | item-def "TYPEDEF"
                   { if (check_dup($$->storage, "storage") $$->storage = 1; }
        | item-def "PREFIX" STRING
                   { if (check_dup($$->prefix, "prefix") $$->prefix = $3; }
        | item-def "TAG" STRING
                   { if (check_dup($$->tag, "tag") $$->tag = $3; }
    

    You can separate all those item-def productions into something like:

    item-def: "ITEM" NAME   { /* ... */ }
            | item-def item-option
    item-option: type | storage | prefix | tag
    

    But then in the actions you need to get at the item object, which is not part of the option production. You can do that with a Bison feature which lets you look into the parser stack:

    prefix: "PREFIX" STRING { if (check_dup($<item>0->prefix, "prefix")
                                $<item>0->prefix = $2; }
    

    In this context, $0 will refer to whatever came before prefix, which is whatever came before item-option, which is an item-def. See the end of this section in the Bison manual, where it describes this practice as "risky", which it is. It also requires you to explicitly specify the tag, because bison doesn't do the grammar analysis necessary to validate the use of $0, which would identify its type.