Sorry for the complicated title, but it's a bit hard to explain in just one sentence.
So I'm writing a simple interpreted language to help with some stuff that I often do. I have a lexer set up, feeding into an abstract syntax tree generator.
The Abstract Syntax Tree spits out Expressions. (Which I'm passing around using unique_ptrs). There's several types of expressions that are derived from this base class, which include:
etc. Each derived class contains the info it needs for that expression, i.e. variables contain a std::string of their identifier, binary operations contain unique_ptrs to the left and right hand side as well as a char of the operator.
Now this is working perfectly, and expressions are parsed just as they should be.
This is what an AST would look like for 'x=y*6^(z-4)+5'
+--Assignment (=)--+
| |
Var (x) +--------BinOp (+)----+
| |
5 +------------BinOp (*)---+
| |
+---------BinOp (^)-------+ Var (y)
| |
Num (6) +------BinOp (-)-----+
| |
Var (z) Num (4)
The issue arises when trying to decouple the AST from the interpreter. I want to keep it decoupled in case I want to provide support for compilation in the future, or whatever. Plus the AST is already getting decently complex and I don't want to add to it. I only want the AST to have information about how to take tokens and convert them, in the right order, into an expression tree.
Now, the interpreter should be able to traverse this list of top down expressions, and recursively evaluate each subexpression, adding definitions to memory, evaluating constants, assigning definitions to their functions, etc. But, each evaluation must return a value so that I can recursively traverse the expression tree.
For example, a binary operation expression must recursively evaluate the left hand side and the right hand side, and then perform an addition of the two sides and return that.
Now, the issue is, the AST returns pointers to the base class, Expr – not the derived types. Calling getExpression returns the next expression regardless of it's derived type, which allows me to easily recursively evaluate binary operations and etc. In order for the interpreter to get the information about these expressions (the number value, or identifier for example), I would have to basically dynamically cast each expression and check if it works, and I'd have to do this repeatedly. Another way would be to do something like the Visitor pattern – the Expr calls the interpreter and passes this to it, which allows the interpreter to have multiple definitions for each derived type. But again, the interpreter must return a value!
This is why I can't use the visitor pattern – I have to return values, which would completely couple the AST to the interpreter.
I also can't use a strategy pattern because each strategy returns wildly different things. The interpreter strategy would be too different from the LLVM strategy, for example.
I'm at a complete loss of what to do here. One really gumpy solution would be to literally have an enum of each expression type as a member of the expr base class, and the interpreter could check the type and then make the appropriate typecast. But that's ugly. Really ugly.
What are my options here? Thanks!
The usual answer (as done with most parser generators) is to have both a token type value and associated data (called attributes in discussion of such things). The type value is generally a simple integer and says "number", "string" "binary op" etc. When deciding what production the use you examine only the token types and when you get a match to a production rule you then know what kind of tokens feed into that rule.
If you want to implement this yourself look up parsing algorithms (LALR and GLR are a couple examples), or you could switch to using a parser generator and only have to worry about getting your grammar correct and then proper implementation of the productions and not have to concern yourself with implementing the parsing engine yourself.