Search code examples
parsingbison

Serializing bison recursive rule to a comma separated string or a vector


I have the following grammar:

program
    :
    | program stm 
    ;

stm 
    : stmt 
    | expr
    ;

stmt 
    : import_stmt 
    ;

expr
    : value
    ;

value
    : STRING | DECIMAL | FLOAT | ID 
    ;

import_stmt
    : KW_IMPORT import_ids END {
        std::cout << "importing " << $2 << "\n"; 
      } 
    ;

import_ids
    : ID
    | import_ids "," ID
    ;

The goal I want to achieve, is that I want the value of import_ids be a comma separated string. So if I'll parse the following code: import a, b, c, d;, the value of an $2 placeholder in the import_stmt rule will either be a comma separated string a, b, c, d, or vector with these letters as its members.

Actually I have not tried anything since I don't know how to do this, tried googling but no result.


Solution

  • In order to use semantic values in the actions (the $2 you have), you need to tell bison about the types you want to use and consistently set the computed value in every action.

    If you're using a C++-based parser, this is reasonably straight-forward -- you need to set the type of the value associated each terminal and non-terminal. So something like:

    %language "C++"
    %define api.value.type variant
    
    %token<std::string> ID         // IDs have a string associated with them
    %type<std::string> import_ids  // as does this non-terminal
    // ... %type for other non-terminals and %token for other token types
    
    %%
    
    // ... various rules
    
    import_ids
        : ID  { $$ = $1 }
        | import_ids ',' ID { $$ = $1 + ", " + $3; }
        ;
    

    This uses std::string's built-in concatenation (operator+) to concatenate the strings. However, it also requires using bison's C++ interface which greatly complicates your lexer.

    If you're not using a C++ based parser (Using a C parser with the traditional yacc %union or with api.type.variant union), then you can't use complex C++ types in the union as they won't get constructed and destroyed properly. If you want to use strings in that case, you'll need to worry about memory management for them, allocating and freeing them appropriately to avoid leaks.