I'm trying to make a very basic tokenizer/lexer.
To do this, I'm making a main struct called Token
that all types of tokens will inherit from, such as IntToken
and PlusToken
.
Every new type of token will include a type
variable as a string, and a to_string
function, which returns a representation like: Token(PLUS)
or Token(INT, 5)
(5 would be replaced by whatever integer value it is);
I've looked at many questions on SO and it looks like I need to make a vector of type std::shared_ptr(BaseClass)
(in my case, BaseClass
would be Token
) https://stackoverflow.com/a/20127962/12101554
I have tried doing this how I would think that it should be made, but since it didn't work, I looked on SO and found the answer linked above, however it doesn't seem to be working.
Am I following the answer wrong, did I make some other error, or is this not possible to do in C++ without a lot of other code?
(I have also tried converting all the struct
's to class
's and adding public:
, but that makes no change)
#include <iostream>
#include <string>
#include <vector>
struct Token {
std::string type = "Uninitialized";
virtual std::string to_string() { return "Not implemented"; };
};
struct IntToken : public Token {
IntToken(int value) {
this->value = value;
}
std::string type = "INT";
int value;
std::string to_string() {
return "Token(INT, " + std::to_string(value) + ")";
}
};
struct PlusToken : public Token {
std::string type = "PLUS";
};
std::vector<std::shared_ptr<Token>> tokenize(std::string input) {
std::vector<std::shared_ptr<Token>> tokens;
for (int i = 0; i < input.length(); i++) {
char c = input[i];
if (std::isdigit(c)) {
std::cout << "Digit" << std::endl;
IntToken t = IntToken(c - 48);
std::cout << t.value << std::endl;
tokens.push_back(std::make_shared<IntToken>(t));
}
else if (c == '+') {
std::cout << "Plus" << std::endl;
PlusToken p = PlusToken();
tokens.push_back(std::make_shared<PlusToken>(p));
}
}
return tokens;
}
int main()
{
std::string input = "5+55";
std::vector<std::shared_ptr<Token>> tokens = tokenize(input);
for (int i = 0; i < tokens.size(); i++) {
//std::cout << tokens[i].to_string() << std::endl;
std::cout << tokens[i]->type << std::endl;
}
}
Current Output:
Digit
5
Plus
Digit
5
Digit
5
Uninitialized
Uninitialized
Uninitialized
Uninitialized
Expected Output: (with current code)
Digit
5
Plus
Digit
5
Digit
5
Token(INT, 5)
Token(PLUS)
Token(INT, 5)
Token(INT, 5)
Note: Yes, I know that the proper tokenization would be (5) (+) (55), but I'm still creating the basic part.
You are giving your derived classes their own type
member variables. Instead you should be setting the type
that belongs to the base class inside the derived-class constructors.