I am writing a small PL/0 compiler for practice.
I ran into some problems while writing the expression evaluation part.
Here is a small example:
-2 + 1
The logic of my program is as below:
lexical analysis
According to the document I found, there is no such thing like 'negative integer' in PL/0, thus -2
will be parsed as -
and 2
. The result of this step is: -
, 2
, +
and 1
syntax analysis
I use SLR1 in this step, since the expression is correct, nothing will happen.
semantic analysis
The program will build an abstract syntax tree in this step. The principle is simple, an operator stack and an operand stack work together to finish this part. However, the "negative" sign will be treated as a "minus" sign in my code, and I have no good idea to deal with this problem.
Here is a piece of code showing the logic:
struct ASTNode {
ASTNode(char op, int val, ASTNode *left = nullptr, ASTNode *right = nullptr);
char _op; // operator
int _val; // operand
shared_ptr<ASTNode> _left; // left child node
shared_ptr<ASTNode> _right; // right child node
};
void semantic_analyzer::construct_tree(vector<string> &tokens) {
stack<shared_ptr<ASTNode>> nodeStack; // operand stack
stack<char> opStack; // operator stack
// iterate tokens and construct part of the AST
for (auto &token : tokens) {
// if the token is a positive integer
if (str_opekit::is_digit(token)) {
int val = stoi(token);
nodeStack.push(make_shared<ASTNode>(' ', val)); // the children will be set as nullptr
}
// If the token is a left paren, push it into operator stack
else if (token == "(") {
opStack.push('(');
}
// If the token is a right paren, pop the operator in the operator stack and calculate result
else if (token == ")") {
// Keep popping operators in the operator stack until reach a left paren
while (opStack.top() != '(') {
// Pop two operands in the operand stack and construct them into a node
shared_ptr<ASTNode> right = nodeStack.top();
nodeStack.pop();
shared_ptr<ASTNode> left = nodeStack.top();
nodeStack.pop();
// Construct a new ASTNode with the two nodes and push into the stack
shared_ptr<ASTNode> node(new ASTNode(opStack.top(), 0, left, right));
opStack.pop();
nodeStack.push(node);
}
opStack.pop(); // pop left paren
}
// If the token is #, which stands for the end of the expression
else if (token == "#") {
break;
}
// If the token is an operator, compare its priority with the operator on the top
// of the stack
else {
char op = token[0];
// while:
// 1. the operator stack is not empty
// 2. the operator on the top of the stack is not a left paren
// 3. the priority of the operator on the top of the stack is higher
while (!opStack.empty() && opStack.top() != '(' &&
is_prior(op, opStack.top()) ) {
// Pop two operands in the stack and construct them into an ASTNode
shared_ptr<ASTNode> right = nodeStack.top();
nodeStack.pop();
shared_ptr<ASTNode> left = nodeStack.top();
nodeStack.pop();
// Construct a new ASTNode with the two nodes and push it into the
// operand stack
shared_ptr<ASTNode> node =
make_shared<ASTNode>(opStack.top(), 0, left, right);
opStack.pop();
nodeStack.push(node);
}
opStack.push(op);
}
}
// Process operators and operands in the stack and construct the tree
while (!opStack.empty()) {
shared_ptr<ASTNode> right = nodeStack.top();
nodeStack.pop();
shared_ptr<ASTNode> left = nodeStack.top();
nodeStack.pop();
shared_ptr<ASTNode> node =
make_shared<ASTNode>(opStack.top(), 0, left, right);
opStack.pop();
nodeStack.push(node);
}
this->_root = nodeStack.top();
nodeStack.pop();
}
Since the "negative" sign will be parsed as "minus" sign, not only the result of execution will be wrong, but also the program will call pop()
on an empty stack, which will cause a crash.
If you need more detailed information about my code, please inform me. Any suggestion is welcomed.
The EBNF is from a course I took before. The part associated with the question will be as follows:
letter ::= a|b|...|X|Y|Z
digit ::= 0|1|...|8|9
identifier ::= <letter>{<letter>|<digit>}
unsigned integer ::= <digit>{<digit>}
factor ::= <identifier>|<unsigned integer>|...
item ::= <factor>{<multiplying operator><factor>}
expression ::= [+|-]<item>{<adding operator><item>}
'|' stands for "|" in regex
'{...}' stands for "*" in regex
'[...]' stands for "zero or one", having some resemblance to "?" in regex
Your parser must build the AST of e.g. expressions, consisting of other expressions and operators (items).
The -
operator can exist in both
expression
and inexpression
s.When your parser parses the first, -
token, it must check if it can or cannot exist in a prefix position, and parse it (emit a prefix AST node with operator and expression) or generate an error. If your parser detects a -
token in an infix position, it must check if the token can be in infix position, and generate an infix AST node with both left
and right
expressions and the operator
.