I am writing a custom expression evaluator to further understand how it works. I'm currently at the first stage of tokenization.
I've encountered the following problem many times before in different programming languages (Python and C#) and I don't understand how I should properly handle this.
I want to turn an expression stored as a string into a vector of tokens that can be used for evaluating the expression later on. I don't want to just store a vector of strings since that'd be a little annoying to deal with during the evaluation.
Here are a few example tokens:
Of course, storing these tokens in a vector means I'd have to read the token later on, and this is where the crux of the problem lies.
How do I create a class structure that satisfies these requirements?
I first attempted this by creating classes based on what they have and all of them inheriting from a base class.
(The following code only shows the declaration.)
class Token {};
class NumberToken : public Token {
public:
int value;
NumberToken(int value);
};
class OperatorToken : public Token {
public:
OperatorType operator_type;
OperatorToken(OperatorType operator_type);
};
class BracketToken : public Token {
public:
BracketType bracket_type;
BracketToken(BracketType bracket_type);
};
The problem I had with this was encountered when putting them all in a vector and trying to read from it:
using sptr = std::shared_ptr; // Only used for readability purposes in this code snippet.
// Creating the vector is nice!
sptr<NumberToken> num = std::make_shared<NumberToken>(NumberToken(100));
sptr<OperatorToken> op = std::make_shared<OperatorToken>(OperatorToken(OperatorType::Addition));
sptr<BracketToken> br = std::make_shared<BracketToken>(BracketToken(BracketType::Open));
std::vector<sptr<Token>> tokens = {num, op, br};
// But reading from it requires dynamic_cast which from what I heard means is an indication of bad code design due to it being type switching...
for (sptr<Token> token : tokens) {
if (NumberToken* numToken = dynamic_cast<NumberToken*>(token)) {
// code for handing a number token
} else if (OperatorToken* opToken = dynamic_cast<OperatorToken*>(token)) {
// code for handing an operator token
}
...
}
My second attempt involved the base class just having all the different methods from each class as pure virtual methods but.. obviously that can't be right. For example, the NumberToken class would have the operator_type field which doesn't make any sense.
Whether you use dynamic_cast
or something else, the relevant factor is that there is a closed, known set of tokens.
The traditional solution for this is the visitor pattern.
Define a base class for handling a token:
struct TokenVisitor {
virtual void visit(const NumberToken&) = 0;
virtual void visit(const OperatorToken&) = 0;
virtual void visit(const BracketToken&) = 0;
};
The base token needs to be able to accept a visitor:
class Token {
public:
virtual void accept(TokenVisitor& visitor) const = 0;
// other stuff
};
And each concrete token needs to implement the visitation:
class NumberToken: public Token {
public:
void accept(TokenVisitor& visitor) const override {
visitor.visit(*this);
}
// other stuff
};
Then you define concrete visitors containing "code for handling NumberToken", etc. There are no dynamic_cast
necessary. There is a small amount of boilerplate in the concrete token classes. The biggest disadvantage is putting the "handling" code in a visitor isn't as clear as a simple switch-like for loop.
However, since C++17, there is another option, avoiding inheritance:
using Token = std::variant<NumberToken, OperatorToken, BracketToken>;
Visitation is as simple as:
std::visit([&](const auto& concreteToken) {
if (is_number_token(concreteToken)) {
...
} else if (...) {
...
}
}, token);
// for explanation, can be done either ways:
template <typename T>
bool is_number_token(const T&) {
return std::is_same_v<T, NumberToken>();
}