I have begun writing a polymorphic recursive descent parser in C++. However I am running an issue. The classes are set up like this:
class Node {
public:
std::vector<Node*> children;
};
class NodeBinary : public Node {
public:
Node* left;
Node* right;
};
class NodeUnary : public Node {
public:
Node* operand;
};
class NodeVar : public Node {
public:
std::string string;
NodeVar(std::string str) : string(str) {};
};
class NodeNumber : public Node {
public:
signed long number;
NodeNumber(signed long n) : number(n) {};
};
// etc.
And then classes like NodeDeclaration
, NodeCall
, NodeNot
, NodeAssignment
, NodePlus
, NodeMinus
, NodeIf
etc. will inherit either from Node
or something less generic like NodeBinary
or NodeUnary
.
However, some of them take more specific operands. NodeAssignment
always takes a var and a number/expression. So I will have to override Node* left to NodeVar* left and NodeExpr* right. The problem comes in with things like NodePlus
. Left can be a NodeVar
or a NodeExpr
! And the root node has a similar problem: while parsing at the top level to add children nodes to root, how is it possible to tell if a child is a NodeExpr
, a NodePlus
, a NodeIf
etc...?
I could have all Nodes have a enum "type" that says what type it is, but then whats the point of having a nice polymorphic inheritance tree?
How is is this problem normally solved??
If you're using class inheritance for your AST nodes, you need to create an appropriate inheritance hierarchy, as with any object-oriented design.
So, for example, NodeAssignment
(which is presumably a specialization of NodeStatement
) needs to contain a NodeLValue
(of which a NodeVariable
is a specialization) and a NodeValue
. As usual, LValues (i.e. things you can assign to) are a subset of Values, so NodeLValue
will be a specialization of NodeValue
. And so on. Your binary operator node will contain left
and right
members, both of which are NodeValue
base objects (I would expect NodeValue
to be pure virtual, with a large number of specific specializations.)
If you insist on using a recursive descent parser, each parsing function needs to return an appropriate subclass of Node
, so that the function which parses the left-hand side of an assignment would logically return a NodeLValue*
, ready to insert into the NodeAssignment
constructor. (Frankly, I'd ditch the word Node
in all of those class names. Put them all into the namespace node::
and save yourself some typing.)