Search code examples
cparsingrecursionbisonflex-lexer

Return the value of the first expression in Bison


I am writing a parser in Bison and flex to parse a "self-made" language. The language contains certain constant expressions of the type:

[OPERATOR, OPERAND1, OPERAND2]

Where an operand can be either a string, integer, real, or another operator expression giving us something like

[OPERATOR, [OPERATOR, OPERAND1, OPERAND2], OPERAND2]

or [OPERATOR, Int1, Int2] etc.

The issue is I want to return on the result of the main expression that calls the other expressions as well. I have made a rule titled as fulloperation to try to deal with this but it only works if the operation expression is followed by another statement (expression is a type of statement). These are my grammar rules and you can find my attempt at recursive return under the fulloperation rule (Note the program starts and ends between a left and right square bracket):

%union{
IntNode intnode;
RealNode realnode;
StrNode strnode;
ExprNode * exprNodePtr;
int linenum;
}

%token <linenum> tADD 
%token <linenum> tSUB 
%token <linenum> tMUL 
%token <linenum> tDIV 
%token <strnode> tSTRING 
%token <intnode> tNUMI
%token <realnode> tNUMR 
%token tPRINT tGET tSET tFUNCTION tRETURN tIDENT tEQUALITY tIF tGT tLT tGEQ tLEQ tINC tDEC 

%type <exprNodePtr> operation;
%type <exprNodePtr> expr;
%type <exprNodePtr> fulloperation;
%start prog 

%%
prog:       '[' stmtlst ']' ;
;

stmtlst:    stmtlst stmt |
;

stmt:       setStmt | if | print | unaryOperation | expr | returnStmt
;

getExpr:    '[' tGET ',' tIDENT ',' '[' exprList ']' ']'
        | '[' tGET ',' tIDENT ',' '[' ']' ']'
        | '[' tGET ',' tIDENT ']'
;

setStmt:    '[' tSET ',' tIDENT ',' expr ']'
;

if:     '[' tIF ',' condition ',' '[' stmtlst ']' ']'
        | '[' tIF ',' condition ',' '[' stmtlst ']' '[' stmtlst ']' ']'
;

print:      '[' tPRINT ',' expr ']'
;


operation:  '[' tADD ',' expr  ',' expr']' {
                                            $$ = SumExpr($4, $6);
                                            }
        | '[' tSUB ',' expr',' expr']'{
                                            $$ = SubExpr($4, $6);
                                            }
        | '[' tMUL ',' expr ',' expr ']'{
                                            $$ = MulExpr($4, $6);
                                            }
        | '[' tDIV ',' expr ',' expr  ']'{
                                            $$ = DivExpr($4, $6);
                                            }
;   


unaryOperation: '[' tINC ',' tIDENT ']'
        | '[' tDEC ',' tIDENT ']'
;

fulloperation: 
                operation stmt{printres($1);}|
                operation {$$ = $1;}
;


    

expr:       tNUMI {$$ = makeExpressionNodeFromInt($1);}
            | tNUMR {$$ = makeExpressionNodeFromReal($1);}
            | tSTRING {$$ = makeExpressionNodeFromStr($1);}
            | getExpr | function | fulloperation {$$ = $1;} | condition
;

function:    '[' tFUNCTION ',' '[' parametersList ']' ',' '[' stmtlst ']' ']'
        | '[' tFUNCTION ',' '[' ']' ',' '[' stmtlst ']' ']'
;

condition:  '[' tEQUALITY ',' expr ',' expr ']'
        | '[' tGT ',' expr ',' expr ']'
        | '[' tLT ',' expr ',' expr ']'
        | '[' tGEQ ',' expr ',' expr ']'
        | '[' tLEQ ',' expr ',' expr ']'
;

returnStmt: '[' tRETURN ',' expr ']'
        | '[' tRETURN ']'
;

parametersList: parametersList ',' tIDENT | tIDENT
;

exprList:   exprList ',' expr | expr
;

Solution

  • For your language, which looks very lisp-like, I would go for a tree, where each rule creates a node that it returns back up.

    For example for the single-branch if statement:

    $$ = makeIfNode($3, $6, NULL)
    

    For the nodes itself I would emulate C++-like inheritance using embedded nodes, where the top base-node contains a function pointer that is initialized to a pointer to evaluate the current node.

    Like for example:

    struct BaseNode
    {
        int (*evaluate)(struct BaseNode *);
    };
    

    And for the if statement:

    struct IfNode
    {
        struct BaseNode base;
        struct BaseNode *condition;
        struct BaseNode *ifTrue;
        struct BaseNode *ifFalse;
    };
    

    The function to create the if statement node:

    struct BaseNode *makeIfNode(struct BaseNode *condition, struct BaseNode *ifTrue, struct BaseNode *ifFalse)
    {
        struct IfNode *ifNode = malloc(sizeof *ifNode);
        // TODO: Error checking
    
        ifNode->base.evaluate = &evaluateIfNode;
        ifNode->condition = condition;
        ifNode->ifTrue = ifTrue;
        ifNode->ifFalse = ifFalse;
    
        return (struct BaseNode *) ifNode;
    }
    
    static struct BaseNode *evaluateIfNode(struct BaseNode *base)
    {
        struct IfNode *ifNode = (struct IfNode *) base;
    
        int result = evaluate(ifNode->condition);
        if (result != 0)
        {
            return evaluate(ifNode->ifTrue);
        }
        else if (ifNode->ifFalse != NULL)
        {
            return evaluate(ifNode->ifTrue);
        }
    }
    

    The global evaluate function takes the function pointer and calls it with the current node:

    int evaluate(struct BaseNode *base)
    {
        return base->evaluate(base);
    }
    

    Creates node structures, node creation functions and node evaluation functions similar to this for all your expressions and all your statements, and your top rule prog will simply call printf("Result of program: %d\n", evaluate($2)); to run all the statements and all the expressions and print the final result.