Search code examples
bisonyaccbisonc++

Accessing The Value Of A Production In Bison As Well As Assigning Types From Another Production


I am working with bison and currently stuck on a problem and relitevly new to how all of this works, I need to be able to tell whether or not a specific production is say > or + or >= but Im unaware of the best way to store and retrieve this value, this is the following code:

%union 
{
  char* text;
  TYPE_INFO typeInfo;
};

N_ARITHLOGIC_EXPR   : N_UN_OP N_EXPR
            {
                if($2.type == FUNCTION){
                    yyerror("Arg 1 cannot be function");
                }
                $$.type = BOOL;
                $$.numParams = NOT_APPLICABLE;
                $$.returnType = NOT_APPLICABLE;

            }
            | N_BIN_OP N_EXPR N_EXPR
            {
                if(T_LT || T_GT || T_LE || T_GE || T_EQ || T_NE || T_NOT){
                    if(!(($2.type == INT && $3.type == INT) || ($2.type == STR && $3.type == STR))){
                        yyerror("Arg n must be integer or string");
                    }
                    else{
                        $$.type = BOOL;
                        $$.numParams = NOT_APPLICABLE;
                        $$.returnType = NOT_APPLICABLE;
                    }
                }
                else if(T_AND || T_OR){
                    if(($2.type == INT && $3.type == FUNCTION) || ($2.type == STR && $3.type == FUNCTION) || ($2.type == BOOL && $3.type == FUNCTION) || ($2.type == FUNCTION && $3.type == FUNCTION)){
                        yyerror("Arg n cannot be a function");
                    }
                    else{
                        $$.type = BOOL;
                        $$.numParams = NOT_APPLICABLE;
                        $$.returnType = NOT_APPLICABLE;                         
                    }
                }
                else if (T_ADD || T_SUB || T_MULT || T_DIV){
                    if(!($2.type == INT && $3.type == INT)){
                        yyerror("Arg n must be integer");
                    }
                    else{
                        $$.type = INT;
                        $$.numParams = NOT_APPLICABLE;
                        $$.returnType = NOT_APPLICABLE;
                    }
                }
            }
            ;

The if statement obviously do not work now but I just need to be able to see if its an reational operator or a arithmetic operator, etc.

Also later on I need to be able to use one production so say the following:

N_PROGN_OR_USERFUNCTCALL : N_FUNCT_NAME N_ACTUAL_PARAMS
            {

            }
            | T_LPAREN N_LAMBDA_EXPR T_RPAREN N_ACTUAL_PARAMS
            {

            }
            ;
N_FUNCT_NAME        : T_PROGN
            {

                //Change type of N_PROGN_OR_USERFUNCTCALL based off of the function return type of T_PROGN

            }

Depending on the return type of T_PROGN i need to be able to change the type of N_PROGN_OR_USERFUNCTCALL, what is the best way to go about this? Thank you!


Solution

  • The most common style for keywords (in the general sense, including operators such as > and <=) is to make each one a unique terminal token. In some cases, that leads to a certain amount of repetition in the grammar, but it avoids excess dependencies between the scanner and the parser. So it's a bit of a balancing act.

    If you want to conflate two keywords with different semantics into a single terminal, you can do that by making the semantic type of the terminal an enum (or moral equivalent), and set that in the lexer. But you could also combine two keywords in the grammar.

    All of the following have their use cases, and they are really not that different:

    Conflated terminal:

     // Scanner patterns
    "<="       yylval.op = OP_LE; return T_BINOP;
    "<"        yylval.op = OP_LT; return T_BINOP;
    "+"        yylval.op = OP_PLUS; return T_BINOP;
     // etc.
    
     // grammar
    %token <opcode> T_BINOP
    
    %%
    
    expr: T_BIN_OP expr expr {
            switch ($1) {
              case OP_LT: case OP_LE: case OP_EQ: ... {
                if (check_compare($2, $3)) {
                  $$ = (TypeInfo){ .type = BOOL,
                                   .numParams = NOT_APPLICABLE,
                                   .returnType = NOT_APPLICABLE };
                else {
                  yyerror(...);
                }
                break
              case OP_PLUS: case OP_MINUS: case OP_TIMES: ...
                if (check_arith($2, $3)) { 
                  // ...
    

    Individual terminals:

     // Scanner patterns
    "<="       return OP_LE;
    "<"        return OP_LT;
    "+"        return OP_PLUS;
     // etc.
    
     // grammar
    %token OP_LE "<=" OP_LT "<" 
           OP_PLUS "+"
           ...
    
    %%
    
    expr: "<=" expr expr {
            if (check_compare($2, $3)) {
              $$ = (TypeInfo){ .type = BOOL,
                               .numParams = NOT_APPLICABLE,
                               .returnType = NOT_APPLICABLE };
            else {
              yyerror(...);
            }
        | "+" expr expr {
            if (check_arith($2, $3)) {
              $$ = (TypeInfo){ .type = BOOL,
                               .numParams = NOT_APPLICABLE,
                               .returnType = NOT_APPLICABLE };
            else {
              yyerror(...);
            }
        // ...
    

    Or, alternative grammar:

    %token OP_LE "<=" OP_LT "<" 
           OP_PLUS "+" OP_TIMES "*"
           ...
    %type <opcode> cmp_op arith_op ...
    %%
    cmp_op: "<="  { $$ = OP_LE; }
          | "<"   { $$ = OP_LT; }
          // ...
    arith_op: "+" { $$ = OP_PLUS; }
            | "*" { $$ = OP_TIMES; }
            // ...
    expr: cmp_op expr expr {
            if (check_compare($2, $3)) {
              $$ = (TypeInfo){ .type = BOOL,
                               .numParams = NOT_APPLICABLE,
                               .returnType = NOT_APPLICABLE };
            else {
              yyerror(...);
            }
        | arith_op expr expr {
            if (check_arith($2, $3)) {
              $$ = (TypeInfo){ .type = BOOL,
                               .numParams = NOT_APPLICABLE,
                               .returnType = NOT_APPLICABLE };
            else {
              yyerror(...);
            }
        // ...
    

    Note: None of the above actually saves the carefully computed opcode, nor the arguments. But, then, neither did the code in the question. But none of that differs significantly from version to version.


    I'm honestly not sure what you mean by your second question. What is the "type of N_PROGN_OR_USERFUNCTCALL"? Do you mean the enumeration you set as the value of the type member of TYPE_INFO? Or do you mean that there is more than one possible semantic type which you might want N_PROGN_OR_USERFUNCTCALL to be? In the latter case, you'll need to rethink your design. Bison/yacc has precisely the same semantics as C, in this regard; the type of a variable is the type of a variable and you cannot decide at runtime whether x is int or double. Non-terminals are the grammatical equivalent of a variable, and each non-terminal has a single predeclared type. If you need more alternatives, you can use a discriminated union (or std::variant, in C++), just as you could with the underlying implementation language. (A discriminated union is a struct containing a type enum and a union of values of different types.)