Search code examples
javaparsinggrammartokenizearithmetic-expressions

How to write a recursive while loop for arimethic calculation java parser


I have written code for an arithmetic calculation parser with the following grammar

'exp' ::= term | term + exp | term - exp

term ::= integer literal

And I have finished the parser for a single term containing only integer literal, however, I cannot achieve the goal of parsing operations in the string equation.

In this application, I was trying to check whether there is a next token present by using Tokenizer.hasNext() method, and if the token is of type Token.Type.Add then return a new Exp with the current literal term and operation and the next Exp that is parsed through the parse() method. I have defined various classes such as Token(String token, Type type),Exp(Term) / Exp(Term,Op,Exp),Term(Lit),Lit(int)

Tokenizer.takeNext() will take the next token and remove it from the current tokenizer buffer. I simply cannot parse operators from the given equations.

Parsing equation: 73 + 65 
73=73
Parsing equation: 10 - 4
10=10
Parsing equation: 7 + 9 + 10
7=7
Parsing equation: 5 - 1
5=5

This is a general approach that I toke from school lecture and it's not homework problem. Any help will be appreciated.

public class Token {
    public enum Type {Unknown, Lit, Add, Minus, Multiply, Division};
    private String _token = "";
    private Type _type = Type.Unknown;
}
public enum Operation {
    None (""),
    Add ("+"),
    Sub ("-"),
    Mult ("*"),
    Div ("/");

    String op;
public class Exp {

    Term _term = null;
    Exp _exp = null;
    Operation _op = null;

    public Exp(Term term) {
        _term = term;
        _op = Operation.None;
    }

    public Exp(Term term, Operation op, Exp exp) {
        _exp = exp;
        _term = term;
        _op = op;
    }
     public Exp parse(){
        Exp term = parseTerm();
        while(_tokenizer.hasNext()) {
            Token token = _tokenizer.takeNext();
            if(token.type() == Token.Type.Add) {
                Operation op = Operation.Add;
                _tokenizer.next();
                Exp exp = parse();
                term = new Exp(term._term,op,exp);
                }
            }
        return term;
    }

    // <term>   ::= <integer literal>
    public Exp parseTerm(){
        Exp exp = null;
        String Lit = "";
        while(_tokenizer.hasNext()) {
            Token token = _tokenizer.takeNext();
            if(token.type() == Token.Type.Lit)
                Lit+=token.token();
            else
                parse();
            }
        Lit lit = new Lit(Integer.parseInt(Lit));
        Term term = new Term(lit);
        exp = new Exp(term);
        return exp;
        }

Solution

  • Let's walk through this for the input 73 + 65:

    You call parse, which calls parseTerm. parseTerm then loops over the tokens. For the first token, it's a literal, so it's added to Lit (PS: it's not good style for variable names to start with capital letters). Then the + token is read and it goes into the else, calling parse. Now parse calls parseLit again, which reads the literal for 65. The token stream is now empty, so parseLit returns. parse also returns because the token stream is still empty (note that the loop in parse has never been entered).

    The value returned from parse to parseLit is the literal expression 65, but parseLit never actually uses that value. It just calls parse and throws the result away.

    So now we're back in the first call of parseLit and parse has just returned. So the token stream is empty and we exit the loop. The content of Lit is "73", so that's what's returned from this call to parseLit. Now we return to the first call to parse, which simply returns the result of parseLit because, again, the token stream is empty, so the loop never gets entered.

    So what went wrong here is that you consumed all of the input without actually building the tree for the addition. The code that's meant to do that (i.e. the while loop in parse) never runs because you've already read all the tokens before it ever gets to the loop because of the loop in parseTerm and the second call to parse (whose result you discard).

    Your grammar rule for term does not use exp, so parseTerm shouldn't be calling parse. And the rule is also not recursive, so parseTerm shouldn't contain a loop. All that parseTerm should do is read a single token that should be a literal and then return the appropriate Exp object for that.