Search code examples
c++compiler-construction

How to distinguish between negative and minus signs in compiler


I am writing a small PL/0 compiler for practice.

I ran into some problems while writing the expression evaluation part.

Here is a small example:

-2 + 1

The logic of my program is as below:

  1. lexical analysis

    According to the document I found, there is no such thing like 'negative integer' in PL/0, thus -2 will be parsed as - and 2. The result of this step is: -, 2, + and 1

  2. syntax analysis

    I use SLR1 in this step, since the expression is correct, nothing will happen.

  3. semantic analysis

    The program will build an abstract syntax tree in this step. The principle is simple, an operator stack and an operand stack work together to finish this part. However, the "negative" sign will be treated as a "minus" sign in my code, and I have no good idea to deal with this problem.

Here is a piece of code showing the logic:

struct ASTNode {
  ASTNode(char op, int val, ASTNode *left = nullptr, ASTNode *right = nullptr);

  char _op;                   // operator
  int _val;                   // operand
  shared_ptr<ASTNode> _left;  // left child node
  shared_ptr<ASTNode> _right; // right child node
};

void semantic_analyzer::construct_tree(vector<string> &tokens) {
  stack<shared_ptr<ASTNode>> nodeStack;  // operand stack
  stack<char> opStack;                   // operator stack

  // iterate tokens and construct part of the AST
  for (auto &token : tokens) {
    // if the token is a positive integer
    if (str_opekit::is_digit(token)) {
      int val = stoi(token);
      nodeStack.push(make_shared<ASTNode>(' ', val)); // the children will be set as nullptr
    }
    // If the token is a left paren, push it into operator stack
    else if (token == "(") {
      opStack.push('(');
    }
    // If the token is a right paren, pop the operator in the operator stack and calculate result
    else if (token == ")") {
      // Keep popping operators in the operator stack until reach a left paren
      while (opStack.top() != '(') {
        // Pop two operands in the operand stack and construct them into a node
        shared_ptr<ASTNode> right = nodeStack.top();
        nodeStack.pop();
        shared_ptr<ASTNode> left = nodeStack.top();
        nodeStack.pop();
        // Construct a new ASTNode with the two nodes and push into the stack
        shared_ptr<ASTNode> node(new ASTNode(opStack.top(), 0, left, right));
        opStack.pop();
        nodeStack.push(node);
      }
      opStack.pop();  // pop left paren
    }
    // If the token is #, which stands for the end of the expression
    else if (token == "#") {
      break;
    }
    // If the token is an operator, compare its priority with the operator on the top
    // of the stack
    else {
      char op = token[0];
      // while:
      // 1. the operator stack is not empty
      // 2. the operator on the top of the stack is not a left paren
      // 3. the priority of the operator on the top of the stack is higher
      while (!opStack.empty() && opStack.top() != '(' &&
             is_prior(op, opStack.top()) ) {
        // Pop two operands in the stack and construct them into an ASTNode
        shared_ptr<ASTNode> right = nodeStack.top();
        nodeStack.pop();
        shared_ptr<ASTNode> left = nodeStack.top();
        nodeStack.pop();
        // Construct a new ASTNode with the two nodes and push it into the
        // operand stack
        shared_ptr<ASTNode> node =
            make_shared<ASTNode>(opStack.top(), 0, left, right);
        opStack.pop();
        nodeStack.push(node);
      }
      opStack.push(op);
    }
  }

  // Process operators and operands in the stack and construct the tree
  while (!opStack.empty()) {
    shared_ptr<ASTNode> right = nodeStack.top();
    nodeStack.pop();
    shared_ptr<ASTNode> left = nodeStack.top();
    nodeStack.pop();
    shared_ptr<ASTNode> node =
        make_shared<ASTNode>(opStack.top(), 0, left, right);
    opStack.pop();
    nodeStack.push(node);
  }

  this->_root = nodeStack.top();
  nodeStack.pop();
}

Since the "negative" sign will be parsed as "minus" sign, not only the result of execution will be wrong, but also the program will call pop() on an empty stack, which will cause a crash.

If you need more detailed information about my code, please inform me. Any suggestion is welcomed.


The EBNF is from a course I took before. The part associated with the question will be as follows:

  1. letter ::= a|b|...|X|Y|Z
  2. digit ::= 0|1|...|8|9
  3. identifier ::= <letter>{<letter>|<digit>}
  4. unsigned integer ::= <digit>{<digit>}
  5. factor ::= <identifier>|<unsigned integer>|...
  6. item ::= <factor>{<multiplying operator><factor>}
  7. expression ::= [+|-]<item>{<adding operator><item>}

'|' stands for "|" in regex

'{...}' stands for "*" in regex

'[...]' stands for "zero or one", having some resemblance to "?" in regex


Solution

  • Your parser must build the AST of e.g. expressions, consisting of other expressions and operators (items).

    The - operator can exist in both

    • prefix position (as a prefix operator) before an expression and in
    • infix position, between two expressions.

    When your parser parses the first, - token, it must check if it can or cannot exist in a prefix position, and parse it (emit a prefix AST node with operator and expression) or generate an error. If your parser detects a - token in an infix position, it must check if the token can be in infix position, and generate an infix AST node with both left and right expressions and the operator.